You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking non-blocking improvements identified during the sparse_binary_matrix PR review.
Pre-release / API stabilization
Make canonicalization a member of sparse_binary_matrix.
Move canonicalize_pcm (currently a free function in pcm_utils.cpp) to a method on sparse_binary_matrix, for consistency with the new validate_sorted_unique_indices member and to keep layout-aware operations on the class. Do before release to avoid an API break later.
Finish sparse-aware PCM utility migration.
Add sparse_binary_matrix-aware overloads/paths for the remaining utilities so the cudaqx::tensor fails to allocate large PCMs #379-scale pipeline does not silently fall back to dense cudaqx::tensor<uint8_t> allocations:
reorder_pcm_columns
shuffle_pcm_columns
pcm_to_sparse_vec
pcm_to_sparse_string
Performance
Avoid repeated to_nested_csc() copies in get_pcm_for_rounds.
Each call, one per sliding-window window, invokes pcm.to_nested_csc(), which allocates a fresh vector<vector<uint32_t>> of size num_cols plus a copy. Add an overload/path that reads directly from canonical CSC arrays (ptr_ / indices_) instead of materializing nested form.
Replace per-column overlap scan with cached round metadata + binary search in get_pcm_for_rounds.
Instead of scanning every column to find round overlaps, compute/cache (first_round, last_round) once and binary-search over a sorted first_round array. This should reduce looping in sliding-window’s repeated calls.
Build generate_random_pcm_sparse CSC arrays directly.
Current flow: allocate nested → sort each inner vector → from_nested_csc allocates/copies again into flat CSC storage. Since the generator already knows each column’s entry count, build col_ptrs and the flat row_indices directly and skip the nested intermediate allocations.
Reserve inner vectors in to_nested_csc() / to_nested_csr().
Do a cheap count/pass or otherwise reserve per-group capacity before pushing entries, to reduce repeated reallocations when materializing nested form.
Refactor / maintainability
Share the column-order comparator between get_sorted_pcm_column_indices and pcm_is_sorted.
Extract the comparator into a shared helper. get_sorted_pcm_column_indices keeps sorting the permutation; pcm_is_sorted can do an O(n) adjacent-inversion check using the exact same ordering definition.
De-duplicate random-column generation between generate_random_pcm and generate_random_pcm_sparse.
Both share the same per-column random-generation logic. Extract into a helper to prevent future drift / subtle differences.
Lower priority
Consider API naming cleanup for pcm_is_sorted. is_pcm_sorted reads more naturally, but this is user-facing API churn across C++/Python/docs, so only do this if we intentionally want naming cleanup before API freeze.
Tracking non-blocking improvements identified during the sparse_binary_matrix PR review.
Pre-release / API stabilization
Make canonicalization a member of
sparse_binary_matrix.Move
canonicalize_pcm(currently a free function inpcm_utils.cpp) to a method onsparse_binary_matrix, for consistency with the newvalidate_sorted_unique_indicesmember and to keep layout-aware operations on the class. Do before release to avoid an API break later.Finish sparse-aware PCM utility migration.
Add sparse_binary_matrix-aware overloads/paths for the remaining utilities so the cudaqx::tensor fails to allocate large PCMs #379-scale pipeline does not silently fall back to dense
cudaqx::tensor<uint8_t>allocations:reorder_pcm_columnsshuffle_pcm_columnspcm_to_sparse_vecpcm_to_sparse_stringPerformance
Avoid repeated
to_nested_csc()copies inget_pcm_for_rounds.Each call, one per sliding-window window, invokes
pcm.to_nested_csc(), which allocates a freshvector<vector<uint32_t>>of sizenum_colsplus a copy. Add an overload/path that reads directly from canonical CSC arrays (ptr_/indices_) instead of materializing nested form.Replace per-column overlap scan with cached round metadata + binary search in
get_pcm_for_rounds.Instead of scanning every column to find round overlaps, compute/cache
(first_round, last_round)once and binary-search over a sortedfirst_roundarray. This should reduce looping in sliding-window’s repeated calls.Build
generate_random_pcm_sparseCSC arrays directly.Current flow: allocate nested → sort each inner vector →
from_nested_cscallocates/copies again into flat CSC storage. Since the generator already knows each column’s entry count, buildcol_ptrsand the flatrow_indicesdirectly and skip the nested intermediate allocations.Reserve inner vectors in
to_nested_csc()/to_nested_csr().Do a cheap count/pass or otherwise reserve per-group capacity before pushing entries, to reduce repeated reallocations when materializing nested form.
Refactor / maintainability
Share the column-order comparator between
get_sorted_pcm_column_indicesandpcm_is_sorted.Extract the comparator into a shared helper.
get_sorted_pcm_column_indiceskeeps sorting the permutation;pcm_is_sortedcan do an O(n) adjacent-inversion check using the exact same ordering definition.De-duplicate random-column generation between
generate_random_pcmandgenerate_random_pcm_sparse.Both share the same per-column random-generation logic. Extract into a helper to prevent future drift / subtle differences.
Lower priority
Consider API naming cleanup for
pcm_is_sorted.is_pcm_sortedreads more naturally, but this is user-facing API churn across C++/Python/docs, so only do this if we intentionally want naming cleanup before API freeze.Refresh docs for sparse PCM APIs after Fix dense --> sparse conversion in
get_decoderto avoid redundant copies #589/Adopt scipy.sparse as optional interop #590.Original Sparse parity-check matrix support for decoders #550 docs follow-up mentioned the 400M dense cap, but Fix dense --> sparse conversion in
get_decoderto avoid redundant copies #589/Adopt scipy.sparse as optional interop #590 changed the dense/scipy story. Re-check Sphinx docs after the sparse/scipy changes and make sure they describe the supported paths accurately.PR: #550