Skip to content

[ENHANCEMENT]: Add count_if and retrieve_if APIs to static_multiset #800

@PointKernel

Description

@PointKernel

Is your feature request related to a problem? Please describe.

static_multiset currently has insert_if and contains_if with stencil/predicate support, but the count, count_outer, retrieve, and retrieve_outer APIs lack corresponding _if variants.

In cuDF's hash join, we want to use a bloom filter to pre-filter probe rows before counting/retrieving matches. The bloom filter produces a per-row boolean predicate. With count_if / retrieve_if, we could skip probe rows that the bloom filter rejects, avoiding unnecessary hash table lookups.

Describe the solution you'd like

Proposed API (following the existing insert_if / contains_if pattern):

  // Count matches only for probe keys where pred(*(stencil + i)) is true.                                                                                                                                                                                                                                                                      
  // Keys where the predicate is false contribute 0 to the count (inner)                                                                                                                                                                                                                                                                        
  // or 1 (outer, for left/full join semantics).                                                                                                                                                                                                                                                                                                
  size_type count_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                      
  size_type count_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                
  // Retrieve matches only for probe keys where pred(*(stencil + i)) is true.                                                                                                                                                                                                                                                                   
  retrieve_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);                                                                                                                                                                                                                                                             
  retrieve_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);      

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions