-
Notifications
You must be signed in to change notification settings - Fork 126
Parallelize file uploads in fs cp command. #4132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Commit: ff6c248
38 interesting tests: 17 KNOWN, 9 BUG, 7 RECOVERED, 4 flaky, 1 SKIP
Top 50 slowest tests (at least 2 minutes):
|
shreyas-goenka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical Issue: Context Shadowing Bug
Line 61 in cp.go shadows the context variable from line 57, causing the defer cancel() to cancel the wrong context. This breaks proper cleanup and cancellation propagation.
Issues
-
Context shadowing at line 61: The errgroup.WithContext returns a new context that shadows the cancellable context from line 57. This means defer cancel() on line 58 will cancel the wrong context.
-
Redundant context check at lines 90-92: The ctx.Err() check inside the goroutine is ineffective since the goroutine may start before cancellation, and cpFileToFile already handles context cancellation.
-
Missing test coverage: TestCp_concurrencyValidation only tests invalid values. Should also test valid values (1, 16, 100) work correctly.
-
No integration test for --concurrency flag: The new flag should have an integration test exercising different concurrency values.
Suggestions
-
Consider removing the double context wrapping since errgroup.WithContext already provides cancellation on error.
-
Document the no-ordering-guarantee behavior in command help text.
-
Consider adding debug logs for performance monitoring.
Questions
-
Why the double context wrapping (lines 57 and 61)? Is there a specific reason beyond what errgroup provides?
-
Is concurrency=16 based on benchmarking? Different scenarios might benefit from different values.
-
With concurrent output, messages will be interleaved. Is this acceptable UX?
Review generated by reviewbot
…bricks/cli into fs-cp-fast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to leave this review before going on vacation.
| @@ -0,0 +1,20 @@ | |||
| Local = true | |||
| Cloud = false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend setting Cloud = true for this test as well. That validates the implementation against a real server. You can use fs cat to confirm that the files were uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also nice to ensure that the local test server implementation matches the remote behaviour, atleast for the interfaces we own (like the fs commands)
…bricks/cli into fs-cp-fast
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add missing blank lines and trailing newlines in fs cp test outputs to match actual command output format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
The HEAD handler for /api/2.0/fs/directories now uses a simple heuristic: if a path has a file extension (e.g., .txt, .json), it's assumed to be a file, not a directory. This avoids the need for complex state tracking or per-test Server overrides while handling the common case correctly. Also regenerated acceptance test output files to include proper blank lines between command outputs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Added comprehensive documentation explaining: - WHY: test server doesn't track UC Volumes state - WHAT: the problem of checking paths before they exist - HOW: file extension heuristic solves the common case - Assumptions and limitations Also simplified code by inlining lastSlash comparison and ensured all comments start with capital letters and end with periods. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
The file extension heuristic is now only applied to Unity Catalog Volumes paths (/Volumes/...), which makes semantic sense since: 1. Workspace files/directories: use tracked state (correct) 2. Volume paths: use heuristic (necessary - we don't track volume state) 3. Non-existent workspace paths: return 404 (correct) Previously, the handler returned 200 for ALL non-existent paths, which was semantically incorrect. Now it only assumes directories exist for volume paths where we genuinely don't have state to check. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Apply file extension heuristic only to /Volumes/ paths where state isn't tracked. Return correct 404 for non-existent workspace paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Resolved conflict by keeping volume-specific file extension heuristic. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
What changes are proposed in this pull request?
This PR improves the performance of the
databricks fs cpcommand when copying directories by parallelizing file uploads. The command uses 8 concurrent workers by default but the number can be controlled via--concurrency.Implementation details:
Filerimplementation as before.Why
--concurrency? No strong preference here, it does not seem that there is a pattern in the CLI to control concurrency in other places. This is the flag name used in most Go tools but I'm happy to use something else.How is this tested?
Added acceptance tests to exercise most code paths + unit tests to validate that the context cancellation and propagation works properly.