Custom multitenant by huanshanxiaoyao · Pull Request #273 · TencentCloud/TencentDB-Agent-Memory

huanshanxiaoyao · 2026-06-27T12:58:13Z

Description | 描述

增加了多租户功能，即之前系统只能支持一个uid的记忆处理，该PR升级到了可支持多uid的记忆存储和处理，保持隔离

Related Issue | 关联 Issue

Change Type | 修改类型

Bug fix | Bug 修复
New feature | 新功能
Documentation update | 文档更新
Code optimization | 代码优化

Self-test Checklist | 自测清单

Verified locally | 本地验证通过
No existing features affected | 无影响现有功能

Additional Notes | 其他说明

Seeding bulk-imports historical turns, so the live 600s L1 idle timeout and pipeline warmup add needless latency. Cap l1IdleTimeoutSeconds at 5s, disable enableWarmup, and pass an explicit 30s destroy timeout so the seed runtime drains and exits promptly. Signed-off-by: Jack <278171810@qq.com>

DeepSeek V4 hybrid models (deepseek-v4-flash) ignore the V3 top-level enable_thinking flag and only suppress reasoning for the object form thinking: { type: "disabled" }. Add a dedicated "deepseek-v4" strategy mapping to that body transform, register it in the valid-strategy list, and document it. Verified against the live endpoint 2026-06. Signed-off-by: Jack <278171810@qq.com>

A weak / no-thinking model copied the English template headings verbatim even when the scene content was Chinese, producing English L3 personas over Chinese L0/L1/L2. Tighten the output-language contract and, when the changed scenes are Chinese-dominant, inject the exact mandatory Chinese heading set data-adjacent in the user prompt so the model cannot fall back to the English template. Signed-off-by: Jack <278171810@qq.com>

The multi-tenant registry runs one TdaiCore per account, each with its own L1/L2/L3 timers and SerialQueue, so background extraction would fan out LLM calls by N accounts. Add a shared ConcurrencyLimiter (async semaphore) that the registry hands to every core via TdaiCoreOptions.extractionLimiter; the pipeline manager/factory acquire it around each extraction run so total concurrent background work stays bounded regardless of tenant count. Signed-off-by: Jack <278171810@qq.com>

L3 persona is an unlocked read-modify-write; a crash or concurrent write mid-flush leaves a truncated persona.md. Add atomicWriteFile (temp + fsync + rename) and route the persona generator, profile sync, scene extractor, and scene index through it so readers never observe a partial file. Signed-off-by: Jack <278171810@qq.com>

One gateway process must safely serve multiple end-user accounts with hard isolation. Add CoreRegistry, holding a lazy, LRU Map<session_key, TdaiCore> with one dataDir per account (baseDir/{safeAccountDir(key)}), so L1/L2/L3 recall, search, and persona are physically isolated per tenant. Thread session_key through the search/recall request types, return prepend_context from /recall, and add account listing plus a session-scoped namespace wipe for hard-delete. Gated by TDAI_MULTI_TENANT; single-core behavior is unchanged when off. Signed-off-by: Jack <278171810@qq.com>

Add a local dev-console (scripts/dev-console) that proxies the gateway and renders the per-account memory pyramid for manual recall/search testing, plus scripts/import-psydt.ts to bulk-seed PsyDTCorpus accounts into the multi-tenant store. Wire both up as npm scripts (dev-console, import-psydt). Signed-off-by: Jack <278171810@qq.com>

Add the multi-tenant design doc and upstream issue under docs/, project guidance in CLAUDE.md, and a tdai-gateway.yaml sample that references ${DASHSCOPE_API_KEY} for embedding config without embedding secrets. Signed-off-by: Jack <278171810@qq.com>

Structural multi-tenant isolation is physical: each account gets its own dataDir + SQLite file. The TCVDB backend ignores dataDir and routes every per-account core to one shared database/collection set, while the structural route does not push session_key into L1/L0 search — so multiTenant=true + storeBackend=tcvdb silently returns other accounts' memories from /recall and /search/memories. loadGatewayConfig now throws on that exact combination so the leak surfaces as a startup failure instead of cross-tenant data exposure. Single-tenant + TCVDB (one shared core, one database) and multi-tenant + SQLite are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

/seed writes to a shared baseDir/seed-<ts> snapshot dir, not the per-account store under baseDir/{account} that recall/search read. In multi-tenant mode a "successful" seed (200, l0_recorded > 0) is therefore invisible to every core — a silent no-op that misleads backfill callers (scripts/import-psydt.ts already documents this and works around it). Return 400 in multi-tenant mode, pointing operators at the per-account seeding path (executeSeed into registry.resolveDataDir(session_key)). Single-tenant /seed is unchanged. Covered by the multi-tenant HTTP e2e test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

…live request The multi-tenant LRU evictor (maxResidentCores) could pick a core that another in-flight request was still using: getCore() returned the core, then a different account's request triggered evictLruIfNeeded → core.destroy(), closing the SQLite handle underneath the first handler. Only triggered when maxResidentCores was set below peak concurrent accounts, but then it surfaced as a capture/recall/search hitting a closed store. Add a lease/refcount: registry.acquire() pins a core (pins++) and returns a release(); request handlers hold the lease for the whole call and release it in a finally. Eviction skips any core with pins > 0 (allowing a transient over-limit rather than killing a live request), and manual evict()/wipe() now defer teardown until in-flight leases drain. release() is idempotent. getCore() stays pin-free for health/eager-warmup/tests. Health reports resident.pinned. Covered by three new registry tests (no eviction under lease, wipe defers to drain, idempotent double-release); existing LRU + e2e suites still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

baseOverrides only set server/data, so loadGatewayConfig() still loaded the repo-root tdai-gateway.yaml from CWD — enabling DashScope embedding. On a machine with DASHSCOPE_API_KEY set the suite would silently exercise real embedding/network instead of the keyword/FTS path it asserts on (the comment already claimed "provider none" but nothing enforced it). Pass memory: parseConfig({ extraction:{enabled:false}, embedding:{provider: "none"}, recall:{strategy:"keyword"} }) so the tests are self-contained. Also drops e2e wall-time (no embedding init attempts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

Multi-tenant /health hardcoded stores.{vectorStore,embeddingService}=false (cores are lazy/per-account, so there's no single store to probe), leaving operators no way to tell whether vector recall is even configured — the exact blind spot that made "is embedding wired?" un-answerable from the API. Add an `embedding` block to /health (both modes) derived from config, not a network probe (health must stay a cheap liveness check): `configured` is true only when embedding is enabled, the provider isn't the "none" sentinel, and the config has no error, plus provider/model/dimensions and the recall strategy. The live "did vectors actually fire" signal remains the `strategy` field on /search/memories. Also surfaces resident.pinned (active leases). e2e asserts the field under the hermetic provider=none/keyword boot; integration guide updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

maxResidentCores is a count bound: a long tail of accounts that each go quiet still linger until an LRU push. Add coreIdleTtlMs (env TDAI_CORE_IDLE_TTL_MS / yaml data.coreIdleTtlMs) — a periodic sweep evicts any unpinned core idle longer than the TTL, reclaiming memory during quiet periods. Complements the count bound with a time bound. Default 0 = disabled (no behaviour change); multi-tenant only. The sweep timer is unref()'d so it never holds the process open, and is cleared on destroyAll. Pinned (in-flight) cores are always spared, consistent with the lease refcount. Covered by four registry tests (disabled no-op, idle evicted / fresh spared, pinned spared then reclaimed, background timer reclaims). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

Two known-issue annotations, no behaviour change: - persona.md is written by multiple stages on different SerialQueues (L3 persona-generator, L2 scene-nav, tcvdb profile-sync). atomicWriteFile guards torn reads but NOT lost updates, and the L3 path mutates the file via a ~180s LLM run, so a correct fix needs LLM-to-staging + a per-account lock — too big for a one-line guard. Annotate both local write sites so the next change knows the shape of the proper fix; tcvdb+multiTenant (the extra writer) is already rejected at config load. - conversation-search applies session_key as a POST-filter over topK. Clarify that it's a structural no-op in multi-tenant (each core's store is single account) but REQUIRED in single-tenant shared-store mode, where topK dilution can drop results — the real fix being session pushdown into the L0 query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

…f scope Spell out in the integration guide that embedding (DashScope text-embedding-v3) and storage (local SQLite + sqlite-vec + FTS5) are independent layers and the only supported multi-tenant stack. Note that the tcvdb backend is a separate design with its own server-side embedding and shared cloud collections, that multiTenant + tcvdb is rejected at startup by design, and therefore the "tcvdb breaks isolation" issue does not apply to a DashScope+SQLite deployment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

persona.md is written by two background stages on separate SerialQueues — L3 PersonaGenerator.generateLocalPersona (read → ~180s LLM tool-write → final write) and L2 SceneExtractor.updateSceneNavigation (read → strip → append → write). Both are read-modify-write; atomicWriteFile prevents a torn read but not a lost update, so an interleaving L2 nav write can clobber a freshly regenerated L3 body (or vice-versa). This race is live in the supported DashScope+SQLite multi-tenant stack (the profile-sync writer is tcvdb-only, and tcvdb+multiTenant is rejected at config load). Add KeyedAsyncMutex (per-key FIFO async mutex, error-isolated, in-process) and route both writers' whole RMW through the shared fileWriteMutex keyed by persona.md's absolute path. Method B: the L3 critical section spans the full LLM run because the LLM writes the file mid-run. Per-account isolation is automatic (distinct paths never contend); recall reads take no lock, so the read hot path is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

scripts/smoke-recall.mjs validates a running gateway end-to-end against its real embedding provider (DashScope): GET /health (gates fast on embedding.configured=false), POST /capture a distinctive fact, poll POST /search/memories with a no-shared-keyword paraphrase until an L1 atom is recalled, and assert strategy is hybrid/embedding (vectors fired) rather than fts/none. Cleans up via /namespace/wipe in multi-tenant mode. Dependency-free Node ESM (global fetch) — no build/tsx needed. Exit 0 PASS / 1 FAIL / 2 setup-error, with actionable diagnostics that distinguish "L1 never formed" from "formed but embedding not contributing". Probe text overridable via SMOKE_* env for non-Chinese or domain-specific deployments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

Maxwell-Code07 · 2026-06-27T17:24:05Z

We appreciate your contribution. Our team will conduct an internal evaluation of this PR and get back to you shortly.

Point integration teams at scripts/smoke-recall.mjs (commit 6b0a573) as the first post-deploy check: it automates and asserts the §6 vector-recall test and is wired into the §10 quickstart checklist. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jack <278171810@qq.com>

huanshanxiaoyao · 2026-06-28T07:05:10Z

We appreciate your contribution. Our team will conduct an internal evaluation of this PR and get back to you shortly.

Thanks,

希望能被采纳，或者共同交付一个多租户可用的版本

Lock the multi-tenant fork's deploy state for AI4ALL production: - config.ts: default `multiTenant` to true so the fork is multi-tenant by default even if TDAI_MULTI_TENANT is unset (prod still sets it explicitly). - dev-console: surface recall `strategy` in search results (hybrid/embedding/ fts/none) to distinguish vector recall from keyword at a glance. - .gitignore: ignore throwaway `scripts/tmp_*.py` analysis scripts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

huanshanxiaoyao and others added 18 commits June 27, 2026 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom multitenant#273

Custom multitenant#273
huanshanxiaoyao wants to merge 20 commits into
TencentCloud:mainfrom
huanshanxiaoyao:custom-multitenant

huanshanxiaoyao commented Jun 27, 2026 •

edited

Loading

Uh oh!

Maxwell-Code07 commented Jun 27, 2026

Uh oh!

huanshanxiaoyao commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

huanshanxiaoyao commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description | 描述

Related Issue | 关联 Issue

Change Type | 修改类型

Self-test Checklist | 自测清单

Additional Notes | 其他说明

Uh oh!

Maxwell-Code07 commented Jun 27, 2026

Uh oh!

huanshanxiaoyao commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

huanshanxiaoyao commented Jun 27, 2026 •

edited

Loading