Improves segmented top-k test compilation times by elstehle · Pull Request #9404 · NVIDIA/cccl

elstehle · 2026-06-11T12:50:53Z

Description

The sort we use for verification contributes a majority of the compilation times. A host-only sort for verification would cut the compilation time in half but runtime would exploude to 1 min and above. DeviceSegmentedRadixSort is a reasonable alternative, cutting Improves top-k test compilation times

Before / after

Test	Metric	Before (`DeviceSegmentedSort`)	After (`DeviceSegmentedRadixSort`)	Δ
keys	compile	1:48.7 (108.7 s)	1:21.8 (81.8 s)	−26.9 s (−25%)
	runtime	3.46 / 3.41 s	3.54 s	+0.1 s
pairs	compile	2:29.7 (149.7 s)	1:46.0 (106.0 s)	−43.6 s (−29%)
	runtime	1.88 / 1.84 s	1.95 s	+0.07 s

copy-pr-bot · 2026-06-11T12:50:57Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-11T14:49:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a26ea090-9984-43e8-b5e6-1aa8a7d9d14d

📥 Commits

Reviewing files that changed from the base of the PR and between 4cbbe8c and b075075.

📒 Files selected for processing (3)

cub/test/catch2_test_device_segmented_topk_keys.cu
cub/test/catch2_test_device_segmented_topk_pairs.cu
cub/test/catch2_test_device_topk_common.cuh

✅ Files skipped from review due to trivial changes (2)

cub/test/catch2_test_device_segmented_topk_pairs.cu
cub/test/catch2_test_device_segmented_topk_keys.cu

🚧 Files skipped from review as they are similar to previous changes (1)

cub/test/catch2_test_device_topk_common.cuh

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

This PR reduces compilation time for segmented top-k tests by switching the verification reference sort from cub::DeviceSegmentedSort to cub::DeviceSegmentedRadixSort. Measured results show ~25–29% faster compile times for the affected tests with only a small runtime increase (~0.07–0.1 s).

Changes

cub/test/catch2_test_device_topk_common.cuh
- Replaced include of DeviceSegmentedSort with DeviceSegmentedRadixSort.
- Switched the verification implementation to use cub::DeviceSegmentedRadixSort::SortKeys / SortKeysDescending, preserving the two-step temp-storage query/allocation + execution flow.
- Updated inline comment/TODO to explain rationale (radix sort compiles much faster with minimal runtime cost).
cub/test/catch2_test_device_segmented_topk_keys.cu
- Documentation-only tweak: clarified comments that per-segment k is passed to batched_topk_keys as a deferred (not immediate) sequence. No behavioral changes.
cub/test/catch2_test_device_segmented_topk_pairs.cu
- Documentation-only tweak: clarified comments that per-segment k is passed to batched_topk_pairs as a deferred (not immediate) sequence. No behavioral changes.

Metrics

keys:
- Compile time: DeviceSegmentedSort 108.7 s → DeviceSegmentedRadixSort 81.8 s (−26.9 s, −25%)
- Runtime: 3.46 / 3.41 s → 3.54 s (+~0.1 s)
pairs:
- Compile time: 149.7 s → 106.0 s (−43.6 s, −29%)
- Runtime: 1.88 / 1.84 s → 1.95 s (+~0.07 s)

Review notes / follow-ups

Review comment suggested compiling DeviceSegmentedRadixSort instantiations in a dedicated translation unit and using extern templates to avoid repeated instantiations. The author agreed this is worth exploring and opened a follow-up issue to investigate (Explore precompiling shared Device[Segmented][Radix]Sort invocations serving as test-reference via extern template #9413). This PR intentionally keeps the change minimal and focused on the immediate compilation-time improvement.

Impact

Significant reduction in test compilation time for segmented top-k verification.
Negligible runtime overhead.
No changes to public APIs or observable test behavior; two files contain comment clarifications only.

suggestion:

Walkthrough

This PR switches the reference segmented sort in top-k tests from DeviceSegmentedSort to DeviceSegmentedRadixSort and updates four inline test comments to state that per-segment k is passed as a deferred sequence to batched top-k operations.

Changes

Top-K Test Infrastructure Update

Layer / File(s)	Summary
Reference sort implementation switch `cub/test/catch2_test_device_topk_common.cuh`	Include directive changed from `device_segmented_sort.cuh` to `device_segmented_radix_sort.cuh`. The `segmented_sort_keys` function now queries temp storage and calls `DeviceSegmentedRadixSort::SortKeys` and `DeviceSegmentedRadixSort::SortKeysDescending` (two-step temp-storage query/allocation and execution). In-function comment/TODO updated.
Test comment corrections `cub/test/catch2_test_device_segmented_topk_keys.cu`, `cub/test/catch2_test_device_segmented_topk_pairs.cu`	Inline comments in four test cases (fixed-size and variable-size segment tests for both topk_keys and topk_pairs) corrected to document that per-segment k parameters are passed as deferred sequences to the batched operations.

Possibly related PRs

NVIDIA/cccl#9311: Added per-segment k test coverage and deferred/immediate sequence forms in the same batched_topk_* call sites that this PR documents.

Suggested reviewers

gevtushenko

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (4)

cub/test/catch2_test_device_segmented_topk_keys.cu (2)

359-359: ⚡ Quick win

suggestion: Grammar: "an deferred" should be "a deferred".

458-458: ⚡ Quick win

suggestion: Grammar: "an deferred" should be "a deferred".

cub/test/catch2_test_device_segmented_topk_pairs.cu (2)

501-501: ⚡ Quick win

suggestion: Grammar: "an deferred" should be "a deferred".

621-621: ⚡ Quick win

suggestion: Grammar: "an deferred" should be "a deferred".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9e11938b-cf83-4294-87a3-86c15cc3befa

📥 Commits

Reviewing files that changed from the base of the PR and between 9cd6640 and abc9e62.

📒 Files selected for processing (3)

cub/test/catch2_test_device_segmented_topk_keys.cu
cub/test/catch2_test_device_segmented_topk_pairs.cu
cub/test/catch2_test_device_topk_common.cuh

fbusato · 2026-06-11T17:57:35Z

question. We use DeviceSegmentedRadixSort several times in our tests. What about compile it in a dedicated file and use extern template for the common case?

elstehle · 2026-06-11T19:06:46Z

question. We use DeviceSegmentedRadixSort several times in our tests. What about compile it in a dedicated file and use extern template for the common case?

Yeah, I think that's worth exploring. Like, we'd extern-template a segmented_sort_keys helper for what we'll identify to be common key/offset types. I will file an issue to explore the option as a follow-up and keep this PR focused on the immediate gains from switching to DeviceSegmentedRadixSort.

I've opened this issue to explore the option: #9413

github-actions · 2026-06-12T06:05:12Z

🥳 CI Workflow Results

🟩 Finished in 38m 22s: Pass: 100%/242 | Total: 1d 21h | Max: 31m 54s | Hits: 99%/146538

See results here.

github-project-automation Bot added this to CCCL Jun 11, 2026

github-project-automation Bot moved this to Todo in CCCL Jun 11, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL Jun 11, 2026

elstehle mentioned this pull request Jun 11, 2026

Adds tests for segment-specific-k-values #9311

Merged

elstehle force-pushed the enh/topk-test-compilation-times branch from 4a1a735 to abc9e62 Compare June 11, 2026 14:43

elstehle marked this pull request as ready for review June 11, 2026 14:45

elstehle requested a review from a team as a code owner June 11, 2026 14:45

elstehle requested a review from pauleonix June 11, 2026 14:45

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL Jun 11, 2026

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

pauleonix approved these changes Jun 12, 2026

View reviewed changes

elstehle added 3 commits June 11, 2026 22:23

improves top-k compile time

a9b63d9

review comment follow-up

2e7da6d

fixes grammar in comments

b075075

elstehle force-pushed the enh/topk-test-compilation-times branch from 4cbbe8c to b075075 Compare June 12, 2026 05:23

elstehle enabled auto-merge (squash) June 12, 2026 05:24

elstehle merged commit 43c30c2 into NVIDIA:main Jun 12, 2026
263 checks passed

github-project-automation Bot moved this from In Review to Done in CCCL Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improves segmented top-k test compilation times #9404

Improves segmented top-k test compilation times #9404
elstehle merged 3 commits into
NVIDIA:mainfrom
elstehle:enh/topk-test-compilation-times

elstehle commented Jun 11, 2026

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

This comment has been minimized.

fbusato commented Jun 11, 2026

Uh oh!

elstehle commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

elstehle commented Jun 11, 2026

Description

Before / after

Uh oh!

copy-pr-bot Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Metrics

Review notes / follow-ups

Impact

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

fbusato commented Jun 11, 2026

Uh oh!

elstehle commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 12, 2026

🥳 CI Workflow Results

🟩 Finished in 38m 22s: Pass: 100%/242 | Total: 1d 21h | Max: 31m 54s | Hits: 99%/146538

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

elstehle commented Jun 11, 2026 •

edited

Loading