feat: full-text thread search#8288
Conversation
…cator - Lazy-build full-text search corpus of message content when search dialog opens - Incremental invalidation on message add/update/delete for near real-time updates - Loading spinner shown while building index to keep user informed - Strict substring matching for title and content search (no fuzzy false positives) - Debounced search input for smooth UI performance
PR Review: feat: full-text thread searchSummaryThis PR adds full-text search across message content in the thread search dialog. Previously, search was title-only (via fuzzy matching with
4 files changed, +358 / -17 lines, 1 commit. Detailed Findings1. Correctness Issues
The The Suggested fix: Fallback logic can show stale/duplicate results In if (fullTextResults.length > 0) {
filteredThreads = fullTextResults.map((r) => r.thread)
} else {
filteredThreads = getFilteredThreads(searchQuery)
}The fallback uses Suggestion: Use 2. Concurrency / Race Condition
If This is mitigated by the fact that 3. Memory / PerformanceNo upper bound on corpus size Each thread's content is capped at 5,000 chars, which is good. But there's no limit on the total number of threads indexed. A power user with 1,000+ threads would have the entire corpus in memory. For most users this is fine, but it would be worth documenting the scaling characteristics or adding a thread count cap.
The method 4. UX ConcernsSnippet only shown for content-only matches In the search method: snippet: contentMatch && !titleMatch
? extractSnippet(entry.contentText, term)
: undefinedWhen "No results found" may flash during index build If the index is still building ( 5. Code Quality
6. TestingNo tests are included. The Recommendation: fix neededThe Nice feature overall -- the architecture (singleton index, lazy build, incremental invalidation) is well thought out and the code is clean. |
- hasPendingWork is now a method taking threads so newly created threads are detected and indexed without requiring a full invalidation - build() loops until no pending work remains, so invalidations that arrive mid-build are not silently dropped (race condition fix) - fallback-to-fzf now gates on indexReady rather than result count, preventing fuzzy false positives once the index is built - snippet is shown for all content matches including matchSource 'both' - 'Still indexing…' empty state prevents 'No results' flash during build - removed dead rebuildIndex() no-op - MAX_INDEXED_THREADS=2000 cap with documented scaling characteristics - extractTextFromContent now uses proper ThreadContent[] type (no `any`) - add 22 unit tests for search-index module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes applied
Fallback-to-fuzzy inconsistency Mid-build race condition No upper bound on corpus size
Snippet for all content matches "No results" flash during build
Misleading "fuzzy matching" comment Tests addedAdded
|
There was a problem hiding this comment.
Follow-up review (new commits detected since last review)
The new commit 5e8d488 ("fix: address full-text search PR review feedback") addresses most of the concerns raised in the previous review. Here is a point-by-point assessment:
Previously Raised Issues — Resolution Status
1. hasPendingWork did not detect new threads (Bug) — FIXED
hasPendingWork now accepts the threads record as a parameter and compares the eligible set against entriesByThreadId.keys(). A new eligibleThreads() helper is shared between hasPendingWork() and doBuild() so they always agree on the target set. This was the most important bug and it is properly fixed. A dedicated regression test (detects newly created threads) confirms the fix.
2. Fallback-to-fuzzy showed inconsistent results — FIXED
The searchResults useMemo now gates on indexReady rather than fullTextResults.length > 0. Once the index is built, its strict substring results are used exclusively (even when empty). Fuzzy fzf fallback only applies while the index is still building. This eliminates the inconsistency.
3. Build race condition (mid-build invalidations dropped) — FIXED
build() now wraps doBuild() in a do { ... } while (hasPendingWork(...)) loop, so invalidations that arrive during an in-flight build are picked up in the next iteration. A unit test (picks up invalidations that arrive mid-build) verifies the fix with a paused fetch promise.
4. No upper bound on corpus size — FIXED
MAX_INDEXED_THREADS = 2000 cap added with documented scaling characteristics (~10 MB at defaults). Threads beyond the cap are excluded via eligibleThreads() which sorts by recency and slices. Good.
5. rebuildIndex() was a dead no-op — FIXED
Removed entirely.
6. Snippet only shown for content-only matches — FIXED
The snippet is now shown for all content matches including matchSource === 'both'. The code comment explains the rationale clearly.
7. "No results found" flash during index build — FIXED
A dedicated "Still indexing..." empty state with spinner is shown when indexBuilding && !hasResults. The searchQuery && !hasResults && !indexBuilding guard prevents the "No results" message from flashing during build. Localized strings added.
8. any type cast in extractTextFromContent — FIXED
The function now uses the proper ThreadContent[] type from @janhq/core.
9. No tests — FIXED
22 unit tests added in search-index.test.ts covering extractTextFromContent, extractSnippet, ThreadSearchIndex search behavior, hasPendingWork lifecycle, and the mid-build race condition. Good coverage of the core logic. The test-only __resetThreadSearchIndexForTests export is a clean approach for singleton testing.
New Observations on the Updated Code
1. Minor: Sorting puts "both" matches after "content" matches
The sort comparator only promotes matchSource === 'title' to the top. Threads matching both title and content are treated the same as content-only matches. These "both" matches are arguably the most relevant results and should arguably sort above content-only matches. This is minor and can be addressed in a follow-up.
2. Minor: build() deduplication still returns the existing promise for concurrent callers
If two callers invoke build() concurrently, the second caller gets the first build's promise. If the second caller passes a newer threads record, this.latestThreads is updated (good), and the while (hasPendingWork) loop should pick up the difference. This looks correct but is subtle — worth a comment.
3. Test quality is solid
The tests cover the important functional paths: title-only match, content-only match, "both" match with snippet, strict substring (no fuzzy false positives), sort order, empty-before-build, hasPendingWork lifecycle (new threads, stale, deleted, invalidate), and the mid-build race condition. The mock structure using vi.mock for the service hub is clean.
Summary
All 9 issues from the previous review have been addressed. The architecture is sound: singleton index with lazy build, incremental invalidation, batched fetching, and a build loop that handles mid-flight changes. The code is well-commented, properly typed, and now has good test coverage. The remaining observations are minor and non-blocking.
Recommendation: can merge
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Follow-up Review (new commit since Jun 7)Reviewing commit: This is a documentation-only commit adding inline JSDoc comments to the Assessment
Previously Raised Issues — All Still Resolved
Recommendation: can merge |
Describe Your Changes
Previously, the search bar only searched through chat titles. This meant if you remembered discussing something in a conversation but didn't remember the title, you couldn't find it. Now the search indexes all message content across all your chats, so searching for a topic like "python async" will find any thread where that topic was discussed even if the title is something generic like "Code Help". More specifically:
Self Checklist