Skip to content

[libcu++] Make argument namespace and wrappers construction public#9251

Merged
pciolkosz merged 6 commits into
NVIDIA:mainfrom
pciolkosz:make_arguments_framework_public
Jun 10, 2026
Merged

[libcu++] Make argument namespace and wrappers construction public#9251
pciolkosz merged 6 commits into
NVIDIA:mainfrom
pciolkosz:make_arguments_framework_public

Conversation

@pciolkosz

@pciolkosz pciolkosz commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Closes #9254

This PR removes underscores from the cuda::argument namespace and the user facing argument wrappers and bound objects construction.

Traits and other utility functions remain underscored, those are meant mostly to be used inside CCCL APIs to examine the argument wrappers passed by the user, but they will most likely be public later on, once we are more confident with the design.

@pciolkosz pciolkosz requested review from a team as code owners June 4, 2026 06:04
@pciolkosz pciolkosz requested a review from fbusato June 4, 2026 06:04
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 4, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 4, 2026

@davebayer davebayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I'll block this PR. We said there would be a discussion about the naming. I'm still strongly against having a separate namespace for these types.

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 4, 2026
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

suggestion:

Walkthrough

This PR replaces internal cuda::__argument wrapper/bounds/traits names with public cuda::argument / cuda::args equivalents, adds a public forwarding header <cuda/argument>, and updates libcudacxx tests, CUB dispatch/kernels/agent, benchmarks, and test call sites to use the new API. No algorithmic behavior was changed.

Changes

CUDA argument API public-API migration

Layer / File(s) Summary
Core wrapper types and public header
libcudacxx/include/cuda/__argument/argument.h, libcudacxx/include/cuda/__argument/argument_bounds.h, libcudacxx/include/cuda/argument, libcudacxx/include/cuda/std/__internal/namespaces.h
Rename wrapper and bounds types/factories from underscored __* to public * names; add <cuda/argument> forwarding header; update deduction guides, wrapper detection, __is_wrapper_v, __unwrap, __traits_impl, and __lowest_/__highest_.
libcudacxx argument API tests
libcudacxx/test/libcudacxx/cuda/argument/*
Update tests to include <cuda/argument> and reference cuda::args wrapper/traits/bounds helpers; add compile-fail cases exercising misuse.
CUB dispatch, kernel, agent, and param extraction
cub/cub/device/dispatch/dispatch_batched_topk.cuh, cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh, cub/cub/agent/agent_batched_topk.cuh, cub/cub/detail/segmented_params.cuh
Switch CUB batched-topk dispatch/kernel/agent and segmented-params extraction code to use ::cuda::args wrapper types, traits, and __unwrap accessors; update kernel parameter element types derived from those traits.
CUB segmented top-k benchmarks using public API
cub/benchmarks/bench/segmented_topk/fixed/keys.cu, cub/benchmarks/bench/segmented_topk/variable/keys.cu
Update includes and argument construction for benchmark dispatches to use ::cuda::args::immediate, ::cuda::args::__immediate_sequence, ::cuda::args::constant, and ::cuda::args::bounds.
CUB device segmented top-k tests
cub/test/catch2_test_device_segmented_topk_keys.cu, cub/test/catch2_test_device_segmented_topk_pairs.cu
Replace internal ::cuda::__argument::__* wrappers in test launch callsites with ::cuda::args immediate/sequence/bounds wrappers without changing argument values.

Assessment against linked issues

Objective Addressed Explanation
Expose argument annotation framework for public usage in cub::DeviceBatchedTopK (#9254)

Possibly related PRs

  • NVIDIA/cccl#8875: Introduced earlier migrations of argument/traits to the public API used here.
  • NVIDIA/cccl#9074: Prior batched-topk refactor touching segmented-params and argument wrappers.

Suggested labels

libcu++

Suggested reviewers

  • gevtushenko
  • davebayer

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4424dd10-8027-4d66-97f4-e3d43a7b6cf4

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7ae38 and 06c277f.

📒 Files selected for processing (20)
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/include/cuda/__argument/argument.h
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/include/cuda/argument
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp

Comment thread libcudacxx/include/cuda/__argument/argument.h Outdated
@pciolkosz

Copy link
Copy Markdown
Contributor Author

I'm sorry, but I'll block this PR. We said there would be a discussion about the naming. I'm still strongly against having a separate namespace for these types.

Tagging @gevtushenko

@pciolkosz pciolkosz force-pushed the make_arguments_framework_public branch from 6002bc6 to 3210e3d Compare June 4, 2026 06:36
@github-actions

This comment has been minimized.

auto segment_sizes = ::cuda::__argument::__constant<MaxSegmentSize>{};
auto k = ::cuda::__argument::__constant<MaxNumSelected>{};
auto select_direction = ::cuda::__argument::__constant<cub::detail::topk::select::max>{};
auto segment_sizes = ::cuda::argument::constant<MaxSegmentSize>{};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++26 has constant_wrapper. It's hard to back-port fully to earlier C++ versions (though there are proposals pending review that might fix this), but for now, it's how C++ spells "compile-time constant." Does this cuda::argument::constant do the same thing? If so, then should we consider just making this an alias to constant_wrapper (or a hypothetical cuda::std::constant_wrapper)?

Comment thread cub/cub/agent/agent_batched_topk.cuh Outdated

using segment_size_val_t = typename ::cuda::__argument::__traits<SegmentSizeParameterT>::element_type;
using num_segments_val_t = typename ::cuda::__argument::__traits<NumSegmentsParameterT>::element_type;
using segment_size_val_t = typename ::cuda::argument::__traits<SegmentSizeParameterT>::element_type;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this library usable without exposing __traits to the public API? If not, then should we consider waiting until __traits is ready?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main user facing part is the construction of the wrappers, while our APIs will use the traits to examine them. Long term we want to expose the __traits I think, but we don't have to right now so keeping them private leaves us room to change them easily and make public later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining! That makes sense.

//! @brief Wraps a compile-time constant argument value.
template <auto _Value>
struct __constant
struct constant

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks more or less like constant_wrapper. Should we consider spelling it cuda::std::constant_wrapper?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We considered that design, cuda::std::constant_wrapper is difficult to back port to older c++ standards and the timeline probably doesn't leave enough room to implement it. I think its fine to just interoperate with constant wrapper later on

static_assert(__valid_static_bounds_v<__element_type, _StaticBounds>,
"static argument bounds cannot be represented by the element type");

_Arg __arg_;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public member. That means it's possible to set the value without validating the bounds. Should we instead consider making the value private, so that every constructed immediate instance has been validated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could make it a private member and add an underscored __access structure that is a friend or I can just underscore the variable and have less code this way. I don't mind either way, but we need a way to access it at the end of the day.
But I think this question does apply to __unwrap, which currently has two version, const and not-const reference. I could see the non-const reference version being not necessary, but it's also not public yet, so we don't have to resolve it right now. I might as well remove it and we can add it when needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pciolkosz Thanks for answering! I don't quite understand the argument why validation is separate from construction.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: Why can't users access the value (by value)? Isn't this just a validated value? Why wouldn't this class just have a value() member?

@pciolkosz pciolkosz Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not be stubborn and changed all members of the wrappers to private and added a friend struct to access them. We might end up exposing member functions to access them, but for now there is only __unwrap free function

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bless you, @pciolkosz ! I can fall asleep with one less worry now : - )

Comment thread libcudacxx/include/cuda/__argument/argument.h

template <class _ElementType, auto _Lowest, auto _Highest>
inline constexpr bool __valid_static_bounds_v<_ElementType, __static_bounds<_Lowest, _Highest>> =
inline constexpr bool __valid_static_bounds_v<_ElementType, static_bounds<_Lowest, _Highest>> =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are runtime_bounds and static_bounds. Do we also need a representation of half-run-time, half-static bounds?

submdspan represents bounds generally as any type for which structured binding into two elements, both convertible to index_type, is well-formed. This would let users pass in bounds however they like. It also would permit a unified representation of any combination of run-time and static bounds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?

@pciolkosz pciolkosz Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tighten the static bounds with runtime bounds, so in this case you can do static_bounds<0, 1000> and runtime_bounds(100, 1000) which effectively would be a half-run-time, half-static bounds. It avoids overcomplicating the input types at the expense of this case being a bit more verbose, but I think it should be rare anyway.

Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?

Good point, I added type validation

Comment thread libcudacxx/include/cuda/__argument/argument.h
@github-actions

This comment has been minimized.

Comment thread libcudacxx/include/cuda/__argument/argument_bounds.h
Comment thread libcudacxx/include/cuda/__argument/argument.h Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7cc00549-bb49-459c-96d8-ed4906627049

📥 Commits

Reviewing files that changed from the base of the PR and between 2c740b7 and 9088870.

📒 Files selected for processing (22)
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/dispatch_batched_topk.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/include/cuda/__argument/argument.h
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
✅ Files skipped from review due to trivial changes (1)
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
🚧 Files skipped from review as they are similar to previous changes (17)
  • libcudacxx/include/cuda/std/__internal/namespaces.h
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
  • cub/cub/agent/agent_batched_topk.cuh
  • cub/cub/detail/segmented_params.cuh
  • cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
  • cub/benchmarks/bench/segmented_topk/fixed/keys.cu
  • cub/benchmarks/bench/segmented_topk/variable/keys.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
  • cub/test/catch2_test_device_segmented_topk_keys.cu
  • libcudacxx/include/cuda/__argument/argument_bounds.h
  • libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp

Comment thread cub/cub/device/dispatch/dispatch_batched_topk.cuh
@pciolkosz

Copy link
Copy Markdown
Contributor Author

Changes from the code review session:

  • cuda::argument -> cuda::args
  • immediate_sequence and constant_sequence will remain underscored for now
  • added a template argument to constant as a work around for c++17 limitations, so you can do constant<10, float>
  • Changed all the member variables of wrappers and bounds to be private

One thing to consider is if we want to expose unwrap

@github-actions

This comment has been minimized.

@miscco

miscco commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

One thing to consider is if we want to expose unwrap

This fully depends on whether we want to productize the full framework for users to use in their own libraries. For now I would say the answer to that is no

These types are currently only used inside CCCL and for that we should not expose such interfaces

@elstehle

elstehle commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

One thing to consider is if we want to expose unwrap

This fully depends on whether we want to productize the full framework for users to use in their own libraries. For now I would say the answer to that is no

These types are currently only used inside CCCL and for that we should not expose such interfaces

Yup, I'm with Michael on this: Let's limit API surface to as little as possible for now. Just enough to allow users providing bounds on the arguments.

@pciolkosz pciolkosz force-pushed the make_arguments_framework_public branch from 9088870 to 4777595 Compare June 8, 2026 23:36
@github-actions

This comment has been minimized.


_CCCL_BEGIN_NAMESPACE_CUDA_ARGUMENT

struct __access;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have cuda/__fwd/argument.h for that

Comment on lines -148 to -154
// =====================================================================
// __assert_in_range
// =====================================================================

template <class _To, class _From>
_CCCL_API constexpr void __assert_in_range([[maybe_unused]] _From __val) noexcept
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this moved?

@github-project-automation github-project-automation Bot moved this from In Progress to In Review in CCCL Jun 9, 2026
@pciolkosz pciolkosz force-pushed the make_arguments_framework_public branch from 4777595 to 5a0f0fe Compare June 9, 2026 16:02
@pciolkosz pciolkosz force-pushed the make_arguments_framework_public branch from 5a0f0fe to f4cb111 Compare June 9, 2026 16:03
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 3h 27m: Pass: 100%/343 | Total: 7d 01h | Max: 2h 29m | Hits: 97%/485146

See results here.

@pciolkosz pciolkosz merged commit 53d1cc1 into NVIDIA:main Jun 10, 2026
363 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Expose argument annotation framework for public usage in cub::DeviceBatchedTopK

5 participants