Custom multitenant#273
Open
huanshanxiaoyao wants to merge 20 commits into
Open
Conversation
Seeding bulk-imports historical turns, so the live 600s L1 idle timeout and pipeline warmup add needless latency. Cap l1IdleTimeoutSeconds at 5s, disable enableWarmup, and pass an explicit 30s destroy timeout so the seed runtime drains and exits promptly. Signed-off-by: Jack <278171810@qq.com>
DeepSeek V4 hybrid models (deepseek-v4-flash) ignore the V3 top-level
enable_thinking flag and only suppress reasoning for the object form
thinking: { type: "disabled" }. Add a dedicated "deepseek-v4" strategy
mapping to that body transform, register it in the valid-strategy list,
and document it. Verified against the live endpoint 2026-06.
Signed-off-by: Jack <278171810@qq.com>
A weak / no-thinking model copied the English template headings verbatim even when the scene content was Chinese, producing English L3 personas over Chinese L0/L1/L2. Tighten the output-language contract and, when the changed scenes are Chinese-dominant, inject the exact mandatory Chinese heading set data-adjacent in the user prompt so the model cannot fall back to the English template. Signed-off-by: Jack <278171810@qq.com>
The multi-tenant registry runs one TdaiCore per account, each with its own L1/L2/L3 timers and SerialQueue, so background extraction would fan out LLM calls by N accounts. Add a shared ConcurrencyLimiter (async semaphore) that the registry hands to every core via TdaiCoreOptions.extractionLimiter; the pipeline manager/factory acquire it around each extraction run so total concurrent background work stays bounded regardless of tenant count. Signed-off-by: Jack <278171810@qq.com>
L3 persona is an unlocked read-modify-write; a crash or concurrent write mid-flush leaves a truncated persona.md. Add atomicWriteFile (temp + fsync + rename) and route the persona generator, profile sync, scene extractor, and scene index through it so readers never observe a partial file. Signed-off-by: Jack <278171810@qq.com>
One gateway process must safely serve multiple end-user accounts with hard
isolation. Add CoreRegistry, holding a lazy, LRU Map<session_key, TdaiCore>
with one dataDir per account (baseDir/{safeAccountDir(key)}), so L1/L2/L3
recall, search, and persona are physically isolated per tenant. Thread
session_key through the search/recall request types, return prepend_context
from /recall, and add account listing plus a session-scoped namespace wipe
for hard-delete. Gated by TDAI_MULTI_TENANT; single-core behavior is
unchanged when off.
Signed-off-by: Jack <278171810@qq.com>
Add a local dev-console (scripts/dev-console) that proxies the gateway and renders the per-account memory pyramid for manual recall/search testing, plus scripts/import-psydt.ts to bulk-seed PsyDTCorpus accounts into the multi-tenant store. Wire both up as npm scripts (dev-console, import-psydt). Signed-off-by: Jack <278171810@qq.com>
Add the multi-tenant design doc and upstream issue under docs/, project
guidance in CLAUDE.md, and a tdai-gateway.yaml sample that references
${DASHSCOPE_API_KEY} for embedding config without embedding secrets.
Signed-off-by: Jack <278171810@qq.com>
Structural multi-tenant isolation is physical: each account gets its own dataDir + SQLite file. The TCVDB backend ignores dataDir and routes every per-account core to one shared database/collection set, while the structural route does not push session_key into L1/L0 search — so multiTenant=true + storeBackend=tcvdb silently returns other accounts' memories from /recall and /search/memories. loadGatewayConfig now throws on that exact combination so the leak surfaces as a startup failure instead of cross-tenant data exposure. Single-tenant + TCVDB (one shared core, one database) and multi-tenant + SQLite are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
/seed writes to a shared baseDir/seed-<ts> snapshot dir, not the per-account
store under baseDir/{account} that recall/search read. In multi-tenant mode a
"successful" seed (200, l0_recorded > 0) is therefore invisible to every core —
a silent no-op that misleads backfill callers (scripts/import-psydt.ts already
documents this and works around it).
Return 400 in multi-tenant mode, pointing operators at the per-account seeding
path (executeSeed into registry.resolveDataDir(session_key)). Single-tenant
/seed is unchanged. Covered by the multi-tenant HTTP e2e test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
…live request The multi-tenant LRU evictor (maxResidentCores) could pick a core that another in-flight request was still using: getCore() returned the core, then a different account's request triggered evictLruIfNeeded → core.destroy(), closing the SQLite handle underneath the first handler. Only triggered when maxResidentCores was set below peak concurrent accounts, but then it surfaced as a capture/recall/search hitting a closed store. Add a lease/refcount: registry.acquire() pins a core (pins++) and returns a release(); request handlers hold the lease for the whole call and release it in a finally. Eviction skips any core with pins > 0 (allowing a transient over-limit rather than killing a live request), and manual evict()/wipe() now defer teardown until in-flight leases drain. release() is idempotent. getCore() stays pin-free for health/eager-warmup/tests. Health reports resident.pinned. Covered by three new registry tests (no eviction under lease, wipe defers to drain, idempotent double-release); existing LRU + e2e suites still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
baseOverrides only set server/data, so loadGatewayConfig() still loaded the
repo-root tdai-gateway.yaml from CWD — enabling DashScope embedding. On a
machine with DASHSCOPE_API_KEY set the suite would silently exercise real
embedding/network instead of the keyword/FTS path it asserts on (the comment
already claimed "provider none" but nothing enforced it).
Pass memory: parseConfig({ extraction:{enabled:false}, embedding:{provider:
"none"}, recall:{strategy:"keyword"} }) so the tests are self-contained.
Also drops e2e wall-time (no embedding init attempts).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
Multi-tenant /health hardcoded stores.{vectorStore,embeddingService}=false
(cores are lazy/per-account, so there's no single store to probe), leaving
operators no way to tell whether vector recall is even configured — the exact
blind spot that made "is embedding wired?" un-answerable from the API.
Add an `embedding` block to /health (both modes) derived from config, not a
network probe (health must stay a cheap liveness check): `configured` is true
only when embedding is enabled, the provider isn't the "none" sentinel, and the
config has no error, plus provider/model/dimensions and the recall strategy.
The live "did vectors actually fire" signal remains the `strategy` field on
/search/memories. Also surfaces resident.pinned (active leases).
e2e asserts the field under the hermetic provider=none/keyword boot; integration
guide updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
maxResidentCores is a count bound: a long tail of accounts that each go quiet still linger until an LRU push. Add coreIdleTtlMs (env TDAI_CORE_IDLE_TTL_MS / yaml data.coreIdleTtlMs) — a periodic sweep evicts any unpinned core idle longer than the TTL, reclaiming memory during quiet periods. Complements the count bound with a time bound. Default 0 = disabled (no behaviour change); multi-tenant only. The sweep timer is unref()'d so it never holds the process open, and is cleared on destroyAll. Pinned (in-flight) cores are always spared, consistent with the lease refcount. Covered by four registry tests (disabled no-op, idle evicted / fresh spared, pinned spared then reclaimed, background timer reclaims). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
Two known-issue annotations, no behaviour change: - persona.md is written by multiple stages on different SerialQueues (L3 persona-generator, L2 scene-nav, tcvdb profile-sync). atomicWriteFile guards torn reads but NOT lost updates, and the L3 path mutates the file via a ~180s LLM run, so a correct fix needs LLM-to-staging + a per-account lock — too big for a one-line guard. Annotate both local write sites so the next change knows the shape of the proper fix; tcvdb+multiTenant (the extra writer) is already rejected at config load. - conversation-search applies session_key as a POST-filter over topK. Clarify that it's a structural no-op in multi-tenant (each core's store is single account) but REQUIRED in single-tenant shared-store mode, where topK dilution can drop results — the real fix being session pushdown into the L0 query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
…f scope Spell out in the integration guide that embedding (DashScope text-embedding-v3) and storage (local SQLite + sqlite-vec + FTS5) are independent layers and the only supported multi-tenant stack. Note that the tcvdb backend is a separate design with its own server-side embedding and shared cloud collections, that multiTenant + tcvdb is rejected at startup by design, and therefore the "tcvdb breaks isolation" issue does not apply to a DashScope+SQLite deployment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
persona.md is written by two background stages on separate SerialQueues — L3 PersonaGenerator.generateLocalPersona (read → ~180s LLM tool-write → final write) and L2 SceneExtractor.updateSceneNavigation (read → strip → append → write). Both are read-modify-write; atomicWriteFile prevents a torn read but not a lost update, so an interleaving L2 nav write can clobber a freshly regenerated L3 body (or vice-versa). This race is live in the supported DashScope+SQLite multi-tenant stack (the profile-sync writer is tcvdb-only, and tcvdb+multiTenant is rejected at config load). Add KeyedAsyncMutex (per-key FIFO async mutex, error-isolated, in-process) and route both writers' whole RMW through the shared fileWriteMutex keyed by persona.md's absolute path. Method B: the L3 critical section spans the full LLM run because the LLM writes the file mid-run. Per-account isolation is automatic (distinct paths never contend); recall reads take no lock, so the read hot path is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
scripts/smoke-recall.mjs validates a running gateway end-to-end against its real embedding provider (DashScope): GET /health (gates fast on embedding.configured=false), POST /capture a distinctive fact, poll POST /search/memories with a no-shared-keyword paraphrase until an L1 atom is recalled, and assert strategy is hybrid/embedding (vectors fired) rather than fts/none. Cleans up via /namespace/wipe in multi-tenant mode. Dependency-free Node ESM (global fetch) — no build/tsx needed. Exit 0 PASS / 1 FAIL / 2 setup-error, with actionable diagnostics that distinguish "L1 never formed" from "formed but embedding not contributing". Probe text overridable via SMOKE_* env for non-Chinese or domain-specific deployments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
Collaborator
|
We appreciate your contribution. Our team will conduct an internal evaluation of this PR and get back to you shortly. |
Point integration teams at scripts/smoke-recall.mjs (commit 6b0a573) as the first post-deploy check: it automates and asserts the §6 vector-recall test and is wired into the §10 quickstart checklist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>
Author
Thanks, 希望能被采纳,或者共同交付一个多租户可用的版本 |
Lock the multi-tenant fork's deploy state for AI4ALL production: - config.ts: default `multiTenant` to true so the fork is multi-tenant by default even if TDAI_MULTI_TENANT is unset (prod still sets it explicitly). - dev-console: surface recall `strategy` in search results (hybrid/embedding/ fts/none) to distinguish vector recall from keyword at a glance. - .gitignore: ignore throwaway `scripts/tmp_*.py` analysis scripts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description | 描述
增加了多租户功能,即之前系统 只能支持一个uid的记忆处理,该PR升级到了 可支持多uid的记忆存储和处理,保持隔离
Related Issue | 关联 Issue
Change Type | 修改类型
Self-test Checklist | 自测清单
Additional Notes | 其他说明