feat: full-text thread search by FedeCuci · Pull Request #8288 · janhq/jan

FedeCuci · 2026-06-06T13:02:34Z

Describe Your Changes

Previously, the search bar only searched through chat titles. This meant if you remembered discussing something in a conversation but didn't remember the title, you couldn't find it. Now the search indexes all message content across all your chats, so searching for a topic like "python async" will find any thread where that topic was discussed even if the title is something generic like "Code Help". More specifically:

Lazy-build full-text search corpus of message content when search dialog opens
Incremental invalidation on message add/update/delete for near real-time updates
Loading spinner shown while building index to keep user informed
Strict substring matching for title and content search (no fuzzy false positives)
Debounced search input for smooth UI performance

Self Checklist

[x ] Added relevant comments, esp in complex areas
Updated docs (for bug fixes / features)
Created issues for follow-up changes or refactoring needed

…cator - Lazy-build full-text search corpus of message content when search dialog opens - Incremental invalidation on message add/update/delete for near real-time updates - Loading spinner shown while building index to keep user informed - Strict substring matching for title and content search (no fuzzy false positives) - Debounced search input for smooth UI performance

tokamak-pm · 2026-06-07T02:05:10Z

PR Review: feat: full-text thread search

Summary

This PR adds full-text search across message content in the thread search dialog. Previously, search was title-only (via fuzzy matching with fzf). Now a ThreadSearchIndex singleton lazily loads all message content into an in-memory corpus and supports substring matching across both titles and message bodies. Key additions:

New web-app/src/lib/search-index.ts -- ThreadSearchIndex class (singleton) with lazy build, batched fetching, incremental invalidation, and substring search.
SearchDialog.tsx -- Integrates the new index with debounced search, loading spinner, and content snippet display.
useMessages.ts / useThreads.ts -- Hooks into add/update/delete/clear operations to invalidate or evict index entries.

4 files changed, +358 / -17 lines, 1 commit.

Detailed Findings

1. Correctness Issues

hasPendingWork does not detect new threads (Bug)

The hasPendingWork getter only checks if the index is null, if there are stale thread IDs, or if there are deleted thread IDs. It does not check whether there are threads in the threads record that are not yet in the index (i.e., newly created threads after initial build). This means when a user creates a new thread, adds messages, and reopens the search dialog, the index will report hasPendingWork = false and skip building, so the new thread will never be searchable until a full invalidation occurs.

The doBuild() method does handle new threads correctly (it checks isNew), but hasPendingWork gates whether build() is called at all from the useEffect in SearchDialog.tsx. This is a functional bug.

Suggested fix: hasPendingWork should accept (or store) the thread record so it can compare against entriesByThreadId.keys(), or the useEffect should always call build() and let doBuild() decide if work is needed.

Fallback logic can show stale/duplicate results

In searchResults useMemo:

if (fullTextResults.length > 0) {
  filteredThreads = fullTextResults.map((r) => r.thread)
} else {
  filteredThreads = getFilteredThreads(searchQuery)
}

The fallback uses getFilteredThreads (fuzzy fzf title search). But the condition checks fullTextResults.length > 0, not whether the index is ready. If the index is built but the search term matches zero content entries, it falls back to fuzzy title search -- which behaves differently than the strict substring matching used by the index for titles. A user searching "xyz" might get a fuzzy false-positive from fzf on the title "xylophone buzz" via the fallback, but not when the index is populated. This inconsistency is confusing.

Suggestion: Use indexReady to decide the branch, not the result count.

2. Concurrency / Race Condition

build() deduplication may skip needed rebuilds

If build() is already running (the guard if (this.buildPromise) return this.buildPromise fires), and a new thread is added or a thread is invalidated during that build, the pending invalidation will not be picked up until a subsequent build is triggered. But the useEffect in SearchDialog only triggers on [open, threads] -- if the dialog is already open and no new thread is created, there is no trigger to re-run build() after the invalidation.

This is mitigated by the fact that indexBuilding flips to false and the debounced search useEffect re-runs, but it only calls index.search(), not index.build(). So stale data from mid-build invalidations may persist until the dialog is closed and reopened.

3. Memory / Performance

No upper bound on corpus size

Each thread's content is capped at 5,000 chars, which is good. But there's no limit on the total number of threads indexed. A power user with 1,000+ threads would have the entire corpus in memory. For most users this is fine, but it would be worth documenting the scaling characteristics or adding a thread count cap.

rebuildIndex() is a no-op

The method rebuildIndex() has a comment saying "just mark index as ready" but does nothing. It appears to be scaffolding for a future optimization (perhaps building a trie or inverted index). If it's not needed now, removing it would reduce confusion.

4. UX Concerns

Snippet only shown for content-only matches

In the search method:

snippet: contentMatch && !titleMatch
  ? extractSnippet(entry.contentText, term)
  : undefined

When matchSource === 'both', no snippet is shown. Users would benefit from seeing the content snippet even when the title also matches, since it provides context about what was discussed. Consider showing the snippet for all content matches.

"No results found" may flash during index build

If the index is still building (indexBuilding === true) and the user types a query that has no title match, the "No results found" empty state will show, even though results may appear once indexing completes. Consider showing a different message (e.g., "Still indexing...") when indexBuilding is true and there are no results yet.

5. Code Quality

Good: Clean separation of concerns -- index logic is isolated in search-index.ts, hooks only call lightweight invalidation methods.
Good: Promise.allSettled for resilient batched fetching.
Good: Debounced search (100ms) to prevent jank.
Good: XML/think-tag stripping from message content.
Minor: The extractTextFromContent function uses any cast (m.content as any). A proper type narrowing or type guard would be safer.
Minor: The comment in search() says "fuzzy matching" for titles but the implementation is strict substring (includes). The comment is misleading.

6. Testing

No tests are included. The search-index.ts module is pure logic with no DOM dependencies -- it is very testable. At minimum, unit tests for extractTextFromContent, extractSnippet, and ThreadSearchIndex.search() should be added. The invalidation and build lifecycle logic is also a good candidate for integration-style tests.

Recommendation: fix needed

The hasPendingWork bug is a functional issue that would cause newly created threads to be unsearchable until all messages are cleared or the app is restarted. The fallback-to-fuzzy inconsistency could also confuse users. These should be fixed before merge. Adding tests for the new search-index module is also strongly recommended.

Nice feature overall -- the architecture (singleton index, lazy build, incremental invalidation) is well thought out and the code is clean.

- hasPendingWork is now a method taking threads so newly created threads are detected and indexed without requiring a full invalidation - build() loops until no pending work remains, so invalidations that arrive mid-build are not silently dropped (race condition fix) - fallback-to-fzf now gates on indexReady rather than result count, preventing fuzzy false positives once the index is built - snippet is shown for all content matches including matchSource 'both' - 'Still indexing…' empty state prevents 'No results' flash during build - removed dead rebuildIndex() no-op - MAX_INDEXED_THREADS=2000 cap with documented scaling characteristics - extractTextFromContent now uses proper ThreadContent[] type (no `any`) - add 22 unit tests for search-index module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

FedeCuci · 2026-06-07T09:53:59Z

Fixes applied

hasPendingWork not detecting new threads (bug)
Changed from a boolean getter to hasPendingWork(threads: Record<string, Thread>). It now compares the eligible thread set against indexed entries, so newly created threads are detected and trigger a build immediately.

Fallback-to-fuzzy inconsistency
The result branch in searchResults now gates on indexReady instead of fullTextResults.length > 0. Once the index is ready, its strict-substring empty set is trusted — no more silent fallback to fuzzy fzf matching that could return false positives.

Mid-build race condition
build() now runs a do…while (hasPendingWork) loop internally. Invalidations or new threads that arrive while a build is in flight are picked up on the next iteration instead of being silently dropped until the dialog is reopened.

No upper bound on corpus size
Added MAX_INDEXED_THREADS = 2000. A shared eligibleThreads() helper (used by both doBuild and hasPendingWork) caps the corpus to the most-recently-updated threads. Both paths use the same eligible set so capped-out threads don't appear perpetually "new" (which would have caused an infinite build loop).

rebuildIndex() no-op removed
Deleted the dead scaffolding method.

Snippet for all content matches
snippet is now produced whenever the term appears in the message body, including when matchSource === 'both'.

"No results" flash during build
Added a distinct "Still indexing…" empty state that shows when indexBuilding is true and there are no results yet. The "No results found" state only renders once building completes.

any cast removed
extractTextFromContent now takes ThreadContent[] | undefined and matches on ContentType.Text from @janhq/core. buildEntryForThread passes m.content directly without casting.

Misleading "fuzzy matching" comment
Updated the search() doc comment to say strict substring matching for both titles and content.

Tests added

Added src/lib/__tests__/search-index.test.ts with 22 unit tests covering:

extractTextFromContent (empty input, joining parts, stripping think blocks, non-text parts)
extractSnippet (absent term, case-insensitivity, ellipsis trimming)
ThreadSearchIndex.search() (title-only, content-only, both, strict substring, sort order, pre-build empty)
hasPendingWork lifecycle (before build, after build, new thread regression, stale/eviction, full invalidate)
Mid-build race condition (invalidation arriving while a fetch is in flight)

tokamak-pm

Follow-up review (new commits detected since last review)

The new commit 5e8d488 ("fix: address full-text search PR review feedback") addresses most of the concerns raised in the previous review. Here is a point-by-point assessment:

Previously Raised Issues — Resolution Status

1. hasPendingWork did not detect new threads (Bug) — FIXED

hasPendingWork now accepts the threads record as a parameter and compares the eligible set against entriesByThreadId.keys(). A new eligibleThreads() helper is shared between hasPendingWork() and doBuild() so they always agree on the target set. This was the most important bug and it is properly fixed. A dedicated regression test (detects newly created threads) confirms the fix.

2. Fallback-to-fuzzy showed inconsistent results — FIXED

The searchResults useMemo now gates on indexReady rather than fullTextResults.length > 0. Once the index is built, its strict substring results are used exclusively (even when empty). Fuzzy fzf fallback only applies while the index is still building. This eliminates the inconsistency.

3. Build race condition (mid-build invalidations dropped) — FIXED

build() now wraps doBuild() in a do { ... } while (hasPendingWork(...)) loop, so invalidations that arrive during an in-flight build are picked up in the next iteration. A unit test (picks up invalidations that arrive mid-build) verifies the fix with a paused fetch promise.

4. No upper bound on corpus size — FIXED

MAX_INDEXED_THREADS = 2000 cap added with documented scaling characteristics (~10 MB at defaults). Threads beyond the cap are excluded via eligibleThreads() which sorts by recency and slices. Good.

5. rebuildIndex() was a dead no-op — FIXED

Removed entirely.

6. Snippet only shown for content-only matches — FIXED

The snippet is now shown for all content matches including matchSource === 'both'. The code comment explains the rationale clearly.

7. "No results found" flash during index build — FIXED

A dedicated "Still indexing..." empty state with spinner is shown when indexBuilding && !hasResults. The searchQuery && !hasResults && !indexBuilding guard prevents the "No results" message from flashing during build. Localized strings added.

8. any type cast in extractTextFromContent — FIXED

The function now uses the proper ThreadContent[] type from @janhq/core.

9. No tests — FIXED

22 unit tests added in search-index.test.ts covering extractTextFromContent, extractSnippet, ThreadSearchIndex search behavior, hasPendingWork lifecycle, and the mid-build race condition. Good coverage of the core logic. The test-only __resetThreadSearchIndexForTests export is a clean approach for singleton testing.

New Observations on the Updated Code

1. Minor: Sorting puts "both" matches after "content" matches

The sort comparator only promotes matchSource === 'title' to the top. Threads matching both title and content are treated the same as content-only matches. These "both" matches are arguably the most relevant results and should arguably sort above content-only matches. This is minor and can be addressed in a follow-up.

2. Minor: build() deduplication still returns the existing promise for concurrent callers

If two callers invoke build() concurrently, the second caller gets the first build's promise. If the second caller passes a newer threads record, this.latestThreads is updated (good), and the while (hasPendingWork) loop should pick up the difference. This looks correct but is subtle — worth a comment.

3. Test quality is solid

The tests cover the important functional paths: title-only match, content-only match, "both" match with snippet, strict substring (no fuzzy false positives), sort order, empty-before-build, hasPendingWork lifecycle (new threads, stale, deleted, invalidate), and the mid-build race condition. The mock structure using vi.mock for the service hub is clean.

Summary

All 9 issues from the previous review have been addressed. The architecture is sound: singleton index with lazy build, incremental invalidation, batched fetching, and a build loop that handles mid-flight changes. The code is well-commented, properly typed, and now has good test coverage. The remaining observations are minor and non-blocking.

Recommendation: can merge

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tokamak-pm · 2026-06-09T02:08:09Z

Follow-up Review (new commit since Jun 7)

Reviewing commit: 145ae3d ("docs: document concurrent build behaviour in ThreadSearchIndex")

This is a documentation-only commit adding inline JSDoc comments to the ThreadSearchIndex class, explaining the concurrent build behaviour — specifically the latestThreads field, the do...while (hasPendingWork) loop, and the promise-sharing pattern for concurrent callers.

Assessment

The comments are accurate and match the actual implementation.
The note that "the second caller's changes are not lost" in the build() JSDoc is a useful clarification of a subtle correctness property.
No new code paths, no regressions introduced.
All issues raised in the original review remain resolved (confirmed in commit 5e8d488).

Previously Raised Issues — All Still Resolved

Issue	Status
`hasPendingWork` did not detect new threads (bug)	Fixed
Fallback-to-fuzzy inconsistency	Fixed
Mid-build race condition	Fixed
No corpus size cap	Fixed — `MAX_INDEXED_THREADS = 2000`
`rebuildIndex()` dead no-op	Removed
Snippet missing for `matchSource === 'both'`	Fixed
"No results" flash during build	Fixed with "Still indexing…" state
`any` type cast in `extractTextFromContent`	Fixed
No unit tests	22 tests added

Recommendation: can merge

github-project-automation Bot added this to Jan Jun 6, 2026

tokamak-pm Bot reviewed Jun 8, 2026

View reviewed changes

docs: document concurrent build behaviour in ThreadSearchIndex

145ae3d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

greymoth-jp mentioned this pull request Jun 28, 2026

fix: ignore IME composition Enter in rename and project dialogs #8359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: full-text thread search#8288

feat: full-text thread search#8288
FedeCuci wants to merge 3 commits into
janhq:mainfrom
FedeCuci:feat/full-text-thread-search

FedeCuci commented Jun 6, 2026

Uh oh!

tokamak-pm Bot commented Jun 7, 2026

Uh oh!

FedeCuci commented Jun 7, 2026

Uh oh!

tokamak-pm Bot left a comment

Uh oh!

tokamak-pm Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FedeCuci commented Jun 6, 2026

Describe Your Changes

Self Checklist

Uh oh!

tokamak-pm Bot commented Jun 7, 2026

PR Review: feat: full-text thread search

Summary

Detailed Findings

1. Correctness Issues

2. Concurrency / Race Condition

3. Memory / Performance

4. UX Concerns

5. Code Quality

6. Testing

Recommendation: fix needed

Uh oh!

FedeCuci commented Jun 7, 2026

Fixes applied

Tests added

Uh oh!

tokamak-pm Bot left a comment

Choose a reason for hiding this comment

Previously Raised Issues — Resolution Status

New Observations on the Updated Code

Summary

Uh oh!

tokamak-pm Bot commented Jun 9, 2026

Follow-up Review (new commit since Jun 7)

Assessment

Previously Raised Issues — All Still Resolved

Recommendation: can merge

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant