research(nightly): fresh-diskann — streaming online index maintenance#426
Draft
research(nightly): fresh-diskann — streaming online index maintenance#426
Conversation
…ine index maintenance Implements FreshDiskANN (Singh et al., VLDB 2022, arXiv:2105.09613) as a new standalone Rust crate providing streaming insert and soft-delete capabilities for Vamana proximity graphs without requiring a full index rebuild. Key features: - In-memory insert buffer searchable immediately via brute-force scan - Lazy consolidation: beam-insert each buffered vector via α-robust Vamana pruning + backlink repair (O(R·L·dim) per vector vs O(N·R·L·dim) rebuild) - Three consolidation policies: Manual, Eager, Lazy(T) - Tombstone-based soft deletes filtered at query time - 8 passing tests; cargo build --release succeeds Benchmark (4-core Xeon @ 2.80 GHz, 10k × 128-dim, k=10): Static baseline : recall@10=0.744, QPS=3178 Eager streaming : recall@10=0.751, QPS=3213, consol=2017ms Lazy T=100 : recall@10=0.751, QPS=3133, consol=2749ms Buffer-only : recall@10=0.751, QPS=3235 https://claude.ai/code/session_01FuyD9huQGmZLdct1bUEm5q
Records the decision to add ruvector-fresh-diskann crate for streaming online index maintenance via lazy consolidation. Includes context (static DiskANN rebuild cost), decision rationale, consequences, and alternatives considered (Qdrant HNSW patching, LanceDB segment merging, FAISS add). https://claude.ai/code/session_01FuyD9huQGmZLdct1bUEm5q
…ntenance Research document covering: - SOTA survey: FreshDiskANN vs Qdrant/Weaviate/Milvus/LanceDB/FAISS streaming approaches - Proposed design with complexity table and architecture diagram - Real benchmark numbers on 4-core Xeon @ 2.80 GHz (recall@10, QPS, consol latency) - Blog-readable walkthrough of beam-insert algorithm - Practical failure modes and mitigations - Production crate layout proposal and 7-item improvement roadmap - Full reference list (arXiv:2105.09613, 1908.10396, 2111.08566, SIGMOD 2024) https://claude.ai/code/session_01FuyD9huQGmZLdct1bUEm5q
Adds rand, thiserror and transitive dependencies (rand_chacha, getrandom, rand_core, zerocopy, ppv-lite86) required by the new crate. https://claude.ai/code/session_01FuyD9huQGmZLdct1bUEm5q
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/ruvector-fresh-diskann— a new standalone Rust crate implementing FreshDiskANN (Singh et al., VLDB 2022, arXiv:2105.09613) for streaming online Vamana graph index maintenance without full rebuilds.docs/adr/ADR-183-fresh-diskann.md— architecture decision record documenting the design, alternatives considered, and consequences.docs/research/nightly/2026-05-06-fresh-diskann/README.md— full research document with SOTA survey, algorithm design, benchmark methodology, and improvement roadmap.Research: FreshDiskANN Streaming Index Maintenance
ADR: ADR-183 | Crate: ruvector-fresh-diskann
ruvector 2026: FreshDiskANN — High-Performance Streaming Vector Index in Rust
150-char SEO summary: ruvector FreshDiskANN enables live vector inserts into Vamana graph indices without full rebuilds — 3200 QPS, recall@10=0.751, 2ms per-vector consolidation in Rust.
Introduction
Every production vector database faces the same trap: graph-based ANN indices (HNSW, DiskANN, NSG) deliver outstanding recall and query throughput but require a full rebuild when new vectors arrive. On a 10,000-vector corpus at 128 dimensions, a static Vamana build takes 28.8 seconds on a 4-core Xeon. At 10M vectors that's hours.
ruvector-fresh-diskannimplements the FreshDiskANN algorithm — a streaming maintenance layer that lets you insert, delete, and query simultaneously without rebuilding. New vectors land in an in-memory buffer (immediately searchable via brute-force), then are consolidated into the Vamana graph via per-vector beam-insert with α-robust pruning. No segment merging. No recall cliff. No rebuild.Features
insert(id, embedding)returns immediately; vector is searchable at onceManual,Eager(per-insert),Lazy(T)(batch at threshold T)stdcargo test -p ruvector-fresh-diskannis greenBenefits
Benchmarks (Real Numbers — 4-core Intel Xeon @ 2.80 GHz)
Dataset: 10,000 vectors × 128 dimensions, f32, synthetic uniform
Config: R=32, L_build=64, L_search=64, α=1.2, k=10
Ground truth: brute-force k-NN over full corpus
Key result: streaming inserts preserve or improve recall vs static build (0.751 vs 0.744) while enabling live ingestion at ~2ms per-vector consolidation cost.
Comparisons vs Competitors
add()without pruningOptimizations (Roadmap)
Vec<bool>alloc)Arc<RwLock<ConsolidatedState>>Get Started
References
docs/research/nightly/2026-05-06-fresh-diskann/README.mddocs/adr/ADR-183-fresh-diskann.mdTest plan
cargo build --release -p ruvector-fresh-diskann— passescargo test -p ruvector-fresh-diskann— 8/8 tests passcargo run --release --bin fresh-diskann-bench— real numbers capturedhttps://claude.ai/code/session_01FuyD9huQGmZLdct1bUEm5q
Generated by Claude Code