Context
Phase 0 of PIAL (ADR-151) landed in #PR-TBD. The Sinclair-12 Miller-Rabin kernel is correct on all pinned A014233 strong-pseudoprime regressions and runs in ~15.6 µs worst case on M-series (`u64::MAX − 58`).
During Phase 0 we ran an apples-to-apples competitor benchmark — `num-prime 0.4.4`'s `is_prime64` in the same binary on the same machine — and measured 884 ns vs our 15.63 µs, i.e. ~17.7× headroom. The PRD §6 target of 50 ns turned out to be structurally unachievable in safe Rust on Apple-silicon (criterion sanity baseline ≈ 467 ps; the harness is honest). The target was relaxed to ≤ 1 µs M-series / ≤ 4 µs WASM in the PRD §6 amendment and ADR-151 "Phase 0 Findings".
Phase 0.1's job is to close the 17.7× gap by adopting Montgomery-form modular multiplication (the actual hot-path differentiator with `num-prime`), and to harden the test suite with property-based fuzzing before we begin to take consumer dependencies on the kernel.
Scope (in)
-
Montgomery reduction for `mr_mulmod_u64` / `mr_powmod_u64` in `crates/ruvector-collections/src/primality_kernel.rs`. Replace the current u128-product `% m` with a Montgomery-form CIOS or REDC implementation. Constraints:
- Same Sinclair-12 witness set — do NOT change the deterministic witness list (see rejection rationale below).
- All A014233 pinned regressions must still pass.
- All 114 build-time table entries must still match (table_cross_check).
- Must still be safe Rust; no inline assembly; no `unsafe` blocks unless absolutely required and justified inline.
-
Property-based fuzz tests added under `crates/ruvector-collections/tests/`:
- `primality_proptest.rs`: random odd `n` agreement test against a slow trial-division oracle for `n < 10^9` (or against the existing kernel for larger `n`).
- 10K iterations minimum on CI; configurable seed for repro.
- Must run in the regular `cargo test` suite, not behind a feature flag.
-
Bench gate: `is_prime_u64 worst case` criterion result must drop into the ≤ 1 µs M-series band (PRD §6 revised target). Re-run the num-prime apples-to-apples baseline in the same change to confirm the gap closes.
Out of scope
- AVX-512 or NEON-specific paths.
- WASM micro-optimisations (Phase 1 work).
- Public-API surface changes (the kernel signatures and the public functions in `src/primality.rs` stay unchanged).
- Touching the build-time prime tables (already validated; should remain byte-identical).
Explicit rejection: 7-witness swap
External Grok review (see `docs/research/miller-rabin-optimizations/GROK-REVIEW-REQUEST.md`) suggested swapping the Sinclair-12 witness set for the empirical 7-witness set found in some literature, claiming a ~40% speedup. Reject this for Phase 0.1 for two reasons:
-
Correctness pedigree: Sinclair (2011) proves base `{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}` is deterministic over all `n < 2^64`. The 7-witness sets in the wild are empirically tested up to `2^64` but not theorem-proven. We are not willing to swap a proven set for an empirical one in a primality utility that may be used in security-adjacent code paths.
-
Test canary regression: `tests/primality_pseudoprimes.rs::SPP_FIRST_11 = 3_825_123_056_546_413_051` is the A014233(11) entry — the smallest strong pseudoprime to bases 2..37. It was added precisely to make "silently dropping base 37" a hard test failure. Swapping witness sets would force us to delete that canary, weakening regression coverage.
The 17.7× gap is a constant-factor / mulmod issue, not a witness-count issue. Montgomery reduction targets the actual bottleneck.
Acceptance criteria
References
- ADR-151 Phase 0 Findings: `docs/adr/ADR-151-miller-rabin-prime-optimizations.md`
- PRD §6 / §6.1 revision: `docs/research/miller-rabin-optimizations/PRD.md`
- Grok review request + response context: `docs/research/miller-rabin-optimizations/GROK-REVIEW-REQUEST.md`
- Phase 0 PR: TBD (will link once opened)
Context
Phase 0 of PIAL (ADR-151) landed in #PR-TBD. The Sinclair-12 Miller-Rabin kernel is correct on all pinned A014233 strong-pseudoprime regressions and runs in ~15.6 µs worst case on M-series (`u64::MAX − 58`).
During Phase 0 we ran an apples-to-apples competitor benchmark — `num-prime 0.4.4`'s `is_prime64` in the same binary on the same machine — and measured 884 ns vs our 15.63 µs, i.e. ~17.7× headroom. The PRD §6 target of 50 ns turned out to be structurally unachievable in safe Rust on Apple-silicon (criterion sanity baseline ≈ 467 ps; the harness is honest). The target was relaxed to ≤ 1 µs M-series / ≤ 4 µs WASM in the PRD §6 amendment and ADR-151 "Phase 0 Findings".
Phase 0.1's job is to close the 17.7× gap by adopting Montgomery-form modular multiplication (the actual hot-path differentiator with `num-prime`), and to harden the test suite with property-based fuzzing before we begin to take consumer dependencies on the kernel.
Scope (in)
Montgomery reduction for `mr_mulmod_u64` / `mr_powmod_u64` in `crates/ruvector-collections/src/primality_kernel.rs`. Replace the current u128-product `% m` with a Montgomery-form CIOS or REDC implementation. Constraints:
Property-based fuzz tests added under `crates/ruvector-collections/tests/`:
Bench gate: `is_prime_u64 worst case` criterion result must drop into the ≤ 1 µs M-series band (PRD §6 revised target). Re-run the num-prime apples-to-apples baseline in the same change to confirm the gap closes.
Out of scope
Explicit rejection: 7-witness swap
External Grok review (see `docs/research/miller-rabin-optimizations/GROK-REVIEW-REQUEST.md`) suggested swapping the Sinclair-12 witness set for the empirical 7-witness set found in some literature, claiming a ~40% speedup. Reject this for Phase 0.1 for two reasons:
Correctness pedigree: Sinclair (2011) proves base `{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}` is deterministic over all `n < 2^64`. The 7-witness sets in the wild are empirically tested up to `2^64` but not theorem-proven. We are not willing to swap a proven set for an empirical one in a primality utility that may be used in security-adjacent code paths.
Test canary regression: `tests/primality_pseudoprimes.rs::SPP_FIRST_11 = 3_825_123_056_546_413_051` is the A014233(11) entry — the smallest strong pseudoprime to bases 2..37. It was added precisely to make "silently dropping base 37" a hard test failure. Swapping witness sets would force us to delete that canary, weakening regression coverage.
The 17.7× gap is a constant-factor / mulmod issue, not a witness-count issue. Montgomery reduction targets the actual bottleneck.
Acceptance criteria
References