Follow-ups (sparse PCM / pcm_utils)

Tracking non-blocking improvements identified during the sparse_binary_matrix PR review.

## Pre-release / API stabilization

1. **Make canonicalization a member of `sparse_binary_matrix`.**
   Move `canonicalize_pcm` (currently a free function in `pcm_utils.cpp`) to a method on `sparse_binary_matrix`, for consistency with the new `validate_sorted_unique_indices` member and to keep layout-aware operations on the class. Do before release to avoid an API break later.

2. **Finish sparse-aware PCM utility migration.**
   Add sparse_binary_matrix-aware overloads/paths for the remaining utilities so the #379-scale pipeline does not silently fall back to dense `cudaqx::tensor<uint8_t>` allocations:

- `reorder_pcm_columns`
- `shuffle_pcm_columns`
- `pcm_to_sparse_vec`
- `pcm_to_sparse_string`

## Performance

3. **Avoid repeated `to_nested_csc()` copies in `get_pcm_for_rounds`.**
   Each call, one per sliding-window window, invokes `pcm.to_nested_csc()`, which allocates a fresh `vector<vector<uint32_t>>` of size `num_cols` plus a copy. Add an overload/path that reads directly from canonical CSC arrays (`ptr_` / `indices_`) instead of materializing nested form.

4. **Replace per-column overlap scan with cached round metadata + binary search in `get_pcm_for_rounds`.**
   Instead of scanning every column to find round overlaps, compute/cache `(first_round, last_round)` once and binary-search over a sorted `first_round` array. This should reduce looping in sliding-window’s repeated calls.

5. **Build `generate_random_pcm_sparse` CSC arrays directly.**
   Current flow: allocate nested → sort each inner vector → `from_nested_csc` allocates/copies again into flat CSC storage. Since the generator already knows each column’s entry count, build `col_ptrs` and the flat `row_indices` directly and skip the nested intermediate allocations.

6. **Reserve inner vectors in `to_nested_csc()` / `to_nested_csr()`.**
   Do a cheap count/pass or otherwise reserve per-group capacity before pushing entries, to reduce repeated reallocations when materializing nested form.

## Refactor / maintainability

7. **Share the column-order comparator between `get_sorted_pcm_column_indices` and `pcm_is_sorted`.**
   Extract the comparator into a shared helper. `get_sorted_pcm_column_indices` keeps sorting the permutation; `pcm_is_sorted` can do an O(n) adjacent-inversion check using the exact same ordering definition.

8. **De-duplicate random-column generation between `generate_random_pcm` and `generate_random_pcm_sparse`.**
   Both share the same per-column random-generation logic. Extract into a helper to prevent future drift / subtle differences.

## Lower priority

9. **Consider API naming cleanup for `pcm_is_sorted`.**
   `is_pcm_sorted` reads more naturally, but this is user-facing API churn across C++/Python/docs, so only do this if we intentionally want naming cleanup before API freeze.

10. **Refresh docs for sparse PCM APIs after #589/#590.**
   Original #550 docs follow-up mentioned the 400M dense cap, but #589/#590 changed the dense/scipy story. Re-check Sphinx docs after the sparse/scipy changes and make sure they describe the supported paths accurately.

PR: https://github.com/NVIDIA/cudaqx/pull/550

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-ups (sparse PCM / pcm_utils) #582

Pre-release / API stabilization

Performance

Refactor / maintainability

Lower priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Follow-ups (sparse PCM / pcm_utils) #582

Description

Pre-release / API stabilization

Performance

Refactor / maintainability

Lower priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions