Skip to content

Follow-ups (sparse PCM / pcm_utils) #582

@vedika-saravanan

Description

@vedika-saravanan

Tracking non-blocking improvements identified during the sparse_binary_matrix PR review.

Pre-release / API stabilization

  1. Make canonicalization a member of sparse_binary_matrix.
    Move canonicalize_pcm (currently a free function in pcm_utils.cpp) to a method on sparse_binary_matrix, for consistency with the new validate_sorted_unique_indices member and to keep layout-aware operations on the class. Do before release to avoid an API break later.

  2. Finish sparse-aware PCM utility migration.
    Add sparse_binary_matrix-aware overloads/paths for the remaining utilities so the cudaqx::tensor fails to allocate large PCMs #379-scale pipeline does not silently fall back to dense cudaqx::tensor<uint8_t> allocations:

  • reorder_pcm_columns
  • shuffle_pcm_columns
  • pcm_to_sparse_vec
  • pcm_to_sparse_string

Performance

  1. Avoid repeated to_nested_csc() copies in get_pcm_for_rounds.
    Each call, one per sliding-window window, invokes pcm.to_nested_csc(), which allocates a fresh vector<vector<uint32_t>> of size num_cols plus a copy. Add an overload/path that reads directly from canonical CSC arrays (ptr_ / indices_) instead of materializing nested form.

  2. Replace per-column overlap scan with cached round metadata + binary search in get_pcm_for_rounds.
    Instead of scanning every column to find round overlaps, compute/cache (first_round, last_round) once and binary-search over a sorted first_round array. This should reduce looping in sliding-window’s repeated calls.

  3. Build generate_random_pcm_sparse CSC arrays directly.
    Current flow: allocate nested → sort each inner vector → from_nested_csc allocates/copies again into flat CSC storage. Since the generator already knows each column’s entry count, build col_ptrs and the flat row_indices directly and skip the nested intermediate allocations.

  4. Reserve inner vectors in to_nested_csc() / to_nested_csr().
    Do a cheap count/pass or otherwise reserve per-group capacity before pushing entries, to reduce repeated reallocations when materializing nested form.

Refactor / maintainability

  1. Share the column-order comparator between get_sorted_pcm_column_indices and pcm_is_sorted.
    Extract the comparator into a shared helper. get_sorted_pcm_column_indices keeps sorting the permutation; pcm_is_sorted can do an O(n) adjacent-inversion check using the exact same ordering definition.

  2. De-duplicate random-column generation between generate_random_pcm and generate_random_pcm_sparse.
    Both share the same per-column random-generation logic. Extract into a helper to prevent future drift / subtle differences.

Lower priority

  1. Consider API naming cleanup for pcm_is_sorted.
    is_pcm_sorted reads more naturally, but this is user-facing API churn across C++/Python/docs, so only do this if we intentionally want naming cleanup before API freeze.

  2. Refresh docs for sparse PCM APIs after Fix dense --> sparse conversion in get_decoder to avoid redundant copies #589/Adopt scipy.sparse as optional interop #590.
    Original Sparse parity-check matrix support for decoders #550 docs follow-up mentioned the 400M dense cap, but Fix dense --> sparse conversion in get_decoder to avoid redundant copies #589/Adopt scipy.sparse as optional interop #590 changed the dense/scipy story. Re-check Sphinx docs after the sparse/scipy changes and make sure they describe the supported paths accurately.

PR: #550

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions