Skip to content

fix: cap BLAS/OpenMP threads to avoid concurrent-proc oversubscription (closes #316)#317

Open
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/limit-blas-thread-oversubscription
Open

fix: cap BLAS/OpenMP threads to avoid concurrent-proc oversubscription (closes #316)#317
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/limit-blas-thread-oversubscription

Conversation

@Huntehhh
Copy link
Copy Markdown
Contributor

Summary

Caps BLAS/OpenMP threads to 1 per MCP proc via os.environ.setdefault in main() before _preload_models(). Prevents concurrent-proc thread oversubscription that produces multi-minute engine.add() hangs under Claude Code sub-agent fan-out.

Closes #316.

Changes

File Change
truememory/mcp_server.py +15 lines: setdefault for OMP_NUM_THREADS, MKL_NUM_THREADS, OPENBLAS_NUM_THREADS, NUMEXPR_NUM_THREADS before _preload_models()

Why setdefault (not assignment)

User override preserved. A beefy single-proc workstation can still set OMP_NUM_THREADS=4 explicitly; the fix only kicks in when the user hasn't expressed an opinion.

Test plan

  • Single-subagent smoke test (10 MCP calls): all completed <1s, perfect cleanup
  • Two-subagent concurrent stress test (66 MCP calls: stores + searches + forgets under contention)
  • Worst-case wall time improved from 1,141,905ms → 5,694ms (~200,000× faster)
  • The 5,694ms post-fix worst case is the cold-load of the embedding model on first store per MCP proc (warm afterwards: <100ms)
  • Cleanup integrity verified — message count 56 → 56 after full store + delete cycle
  • Tested on Windows 11 Pro, Python 3.13, truememory 0.6.8

No breaking changes

setdefault preserves existing env-var behavior. No public API affected.

Co-Authored-By: Claude Opus 4.7 [email protected]

Sets OMP_NUM_THREADS=1, MKL_NUM_THREADS=1, OPENBLAS_NUM_THREADS=1,
NUMEXPR_NUM_THREADS=1 via setdefault in main() before _preload_models().

Why: Claude Code spawns one MCP subprocess per session, and sub-agent
fan-out can push this to 10+ concurrent procs on a single host. With
default thread counts each proc tries to use every CPU core for
PyTorch/sentence-transformers inference. N procs * N cores worth of
threads competing for N cores produces a context-switching collapse —
empirical symptom is multi-minute m.add() hangs (one confirmed
19-minute hang in ~/.truememory/logs/mcp-debug.log under sub-agent
load).

setdefault preserves user override: a beefy single-proc workstation
can still set OMP_NUM_THREADS=4 etc. explicitly.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@Huntehhh Huntehhh force-pushed the fix/limit-blas-thread-oversubscription branch from f5367fb to 5b8fd11 Compare May 14, 2026 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BLAS/OpenMP thread oversubscription causes multi-minute m.add() hangs under concurrent MCP procs

1 participant