PIAL Phase 0.1 — Montgomery reduction + property tests for Miller-Rabin kernel

## Context

Phase 0 of PIAL (ADR-151) landed in #PR-TBD. The Sinclair-12 Miller-Rabin kernel is correct on all pinned A014233 strong-pseudoprime regressions and runs in ~15.6 µs worst case on M-series (\`u64::MAX − 58\`).

During Phase 0 we ran an apples-to-apples competitor benchmark — \`num-prime 0.4.4\`'s \`is_prime64\` in the same binary on the same machine — and measured **884 ns** vs our **15.63 µs**, i.e. ~17.7× headroom. The PRD §6 target of 50 ns turned out to be structurally unachievable in safe Rust on Apple-silicon (criterion sanity baseline ≈ 467 ps; the harness is honest). The target was relaxed to ≤ 1 µs M-series / ≤ 4 µs WASM in the [PRD §6 amendment](../docs/research/miller-rabin-optimizations/PRD.md) and ADR-151 \"Phase 0 Findings\".

Phase 0.1's job is to close the 17.7× gap by adopting Montgomery-form modular multiplication (the actual hot-path differentiator with \`num-prime\`), and to harden the test suite with property-based fuzzing before we begin to take consumer dependencies on the kernel.

## Scope (in)

1. **Montgomery reduction** for \`mr_mulmod_u64\` / \`mr_powmod_u64\` in \`crates/ruvector-collections/src/primality_kernel.rs\`. Replace the current u128-product \`% m\` with a Montgomery-form CIOS or REDC implementation. Constraints:
   - **Same Sinclair-12 witness set** — do NOT change the deterministic witness list (see rejection rationale below).
   - All A014233 pinned regressions must still pass.
   - All 114 build-time table entries must still match (table_cross_check).
   - Must still be safe Rust; no inline assembly; no \`unsafe\` blocks unless absolutely required and justified inline.

2. **Property-based fuzz tests** added under \`crates/ruvector-collections/tests/\`:
   - \`primality_proptest.rs\`: random odd \`n\` agreement test against a slow trial-division oracle for \`n < 10^9\` (or against the existing kernel for larger \`n\`).
   - 10K iterations minimum on CI; configurable seed for repro.
   - Must run in the regular \`cargo test\` suite, not behind a feature flag.

3. **Bench gate**: \`is_prime_u64 worst case\` criterion result must drop into the **≤ 1 µs M-series** band (PRD §6 revised target). Re-run the num-prime apples-to-apples baseline in the same change to confirm the gap closes.

## Out of scope

- AVX-512 or NEON-specific paths.
- WASM micro-optimisations (Phase 1 work).
- Public-API surface changes (the kernel signatures and the public functions in \`src/primality.rs\` stay unchanged).
- Touching the build-time prime tables (already validated; should remain byte-identical).

## Explicit rejection: 7-witness swap

External Grok review (see \`docs/research/miller-rabin-optimizations/GROK-REVIEW-REQUEST.md\`) suggested swapping the Sinclair-12 witness set for the empirical 7-witness set found in some literature, claiming a ~40% speedup. **Reject this for Phase 0.1** for two reasons:

1. **Correctness pedigree**: Sinclair (2011) proves base \`{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}\` is deterministic over all \`n < 2^64\`. The 7-witness sets in the wild are *empirically tested* up to \`2^64\` but not theorem-proven. We are not willing to swap a proven set for an empirical one in a primality utility that may be used in security-adjacent code paths.

2. **Test canary regression**: \`tests/primality_pseudoprimes.rs::SPP_FIRST_11 = 3_825_123_056_546_413_051\` is the A014233(11) entry — the smallest strong pseudoprime to bases 2..37. It was added precisely to make \"silently dropping base 37\" a hard test failure. Swapping witness sets would force us to delete that canary, weakening regression coverage.

The 17.7× gap is a constant-factor / mulmod issue, not a witness-count issue. Montgomery reduction targets the actual bottleneck.

## Acceptance criteria

- [ ] All existing tests still pass (\`cargo test -p ruvector-collections --features unstable-u128\` green).
- [ ] New \`primality_proptest.rs\` runs ≥ 10K iterations and passes.
- [ ] Worst-case bench drops to ≤ 1 µs on M-series (re-run on the same machine class as Phase 0).
- [ ] num-prime apples-to-apples re-baseline shows the gap closed (within ~2× is acceptable).
- [ ] Sinclair-12 witness set unchanged.
- [ ] Zero new clippy warnings on the collections crate.
- [ ] No \`unsafe\` introduced without inline justification comment.

## References

- ADR-151 Phase 0 Findings: \`docs/adr/ADR-151-miller-rabin-prime-optimizations.md\`
- PRD §6 / §6.1 revision: \`docs/research/miller-rabin-optimizations/PRD.md\`
- Grok review request + response context: \`docs/research/miller-rabin-optimizations/GROK-REVIEW-REQUEST.md\`
- Phase 0 PR: TBD (will link once opened)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PIAL Phase 0.1 — Montgomery reduction + property tests for Miller-Rabin kernel #357

Context

Scope (in)

Out of scope

Explicit rejection: 7-witness swap

Acceptance criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PIAL Phase 0.1 — Montgomery reduction + property tests for Miller-Rabin kernel #357

Description

Context

Scope (in)

Out of scope

Explicit rejection: 7-witness swap

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions