Improves segmented top-k test compilation times #9404
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
4a1a735 to
abc9e62
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
SummaryThis PR reduces compilation time for segmented top-k tests by switching the verification reference sort from cub::DeviceSegmentedSort to cub::DeviceSegmentedRadixSort. Measured results show ~25–29% faster compile times for the affected tests with only a small runtime increase (~0.07–0.1 s). Changes
Metrics
Review notes / follow-ups
Impact
suggestion: WalkthroughThis PR switches the reference segmented sort in top-k tests from DeviceSegmentedSort to DeviceSegmentedRadixSort and updates four inline test comments to state that per-segment k is passed as a deferred sequence to batched top-k operations. ChangesTop-K Test Infrastructure Update
Possibly related PRs
Suggested reviewers
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
cub/test/catch2_test_device_segmented_topk_keys.cu (2)
359-359: ⚡ Quick winsuggestion: Grammar: "an deferred" should be "a deferred".
458-458: ⚡ Quick winsuggestion: Grammar: "an deferred" should be "a deferred".
cub/test/catch2_test_device_segmented_topk_pairs.cu (2)
501-501: ⚡ Quick winsuggestion: Grammar: "an deferred" should be "a deferred".
621-621: ⚡ Quick winsuggestion: Grammar: "an deferred" should be "a deferred".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9e11938b-cf83-4294-87a3-86c15cc3befa
📒 Files selected for processing (3)
cub/test/catch2_test_device_segmented_topk_keys.cucub/test/catch2_test_device_segmented_topk_pairs.cucub/test/catch2_test_device_topk_common.cuh
This comment has been minimized.
This comment has been minimized.
|
question. We use |
Yeah, I think that's worth exploring. Like, we'd extern-template a I've opened this issue to explore the option: #9413 |
4cbbe8c to
b075075
Compare
🥳 CI Workflow Results🟩 Finished in 38m 22s: Pass: 100%/242 | Total: 1d 21h | Max: 31m 54s | Hits: 99%/146538See results here. |
Description
The sort we use for verification contributes a majority of the compilation times. A host-only sort for verification would cut the compilation time in half but runtime would exploude to 1 min and above.
DeviceSegmentedRadixSortis a reasonable alternative, cutting Improves top-k test compilation timesBefore / after
DeviceSegmentedSort)DeviceSegmentedRadixSort)