Skip to content

Custom multitenant#273

Open
huanshanxiaoyao wants to merge 20 commits into
TencentCloud:mainfrom
huanshanxiaoyao:custom-multitenant
Open

Custom multitenant#273
huanshanxiaoyao wants to merge 20 commits into
TencentCloud:mainfrom
huanshanxiaoyao:custom-multitenant

Conversation

@huanshanxiaoyao

@huanshanxiaoyao huanshanxiaoyao commented Jun 27, 2026

Copy link
Copy Markdown

Description | 描述

增加了多租户功能,即之前系统 只能支持一个uid的记忆处理,该PR升级到了 可支持多uid的记忆存储和处理,保持隔离

Related Issue | 关联 Issue

Change Type | 修改类型

  • Bug fix | Bug 修复
  • New feature | 新功能
  • Documentation update | 文档更新
  • Code optimization | 代码优化

Self-test Checklist | 自测清单

  • Verified locally | 本地验证通过
  • No existing features affected | 无影响现有功能

Additional Notes | 其他说明

huanshanxiaoyao and others added 18 commits June 27, 2026 11:49
Seeding bulk-imports historical turns, so the live 600s L1 idle timeout
and pipeline warmup add needless latency. Cap l1IdleTimeoutSeconds at 5s,
disable enableWarmup, and pass an explicit 30s destroy timeout so the
seed runtime drains and exits promptly.

Signed-off-by: Jack <278171810@qq.com>
DeepSeek V4 hybrid models (deepseek-v4-flash) ignore the V3 top-level
enable_thinking flag and only suppress reasoning for the object form
thinking: { type: "disabled" }. Add a dedicated "deepseek-v4" strategy
mapping to that body transform, register it in the valid-strategy list,
and document it. Verified against the live endpoint 2026-06.

Signed-off-by: Jack <278171810@qq.com>
A weak / no-thinking model copied the English template headings verbatim
even when the scene content was Chinese, producing English L3 personas
over Chinese L0/L1/L2. Tighten the output-language contract and, when the
changed scenes are Chinese-dominant, inject the exact mandatory Chinese
heading set data-adjacent in the user prompt so the model cannot fall
back to the English template.

Signed-off-by: Jack <278171810@qq.com>
The multi-tenant registry runs one TdaiCore per account, each with its own
L1/L2/L3 timers and SerialQueue, so background extraction would fan out
LLM calls by N accounts. Add a shared ConcurrencyLimiter (async semaphore)
that the registry hands to every core via TdaiCoreOptions.extractionLimiter;
the pipeline manager/factory acquire it around each extraction run so total
concurrent background work stays bounded regardless of tenant count.

Signed-off-by: Jack <278171810@qq.com>
L3 persona is an unlocked read-modify-write; a crash or concurrent write
mid-flush leaves a truncated persona.md. Add atomicWriteFile (temp +
fsync + rename) and route the persona generator, profile sync, scene
extractor, and scene index through it so readers never observe a partial
file.

Signed-off-by: Jack <278171810@qq.com>
One gateway process must safely serve multiple end-user accounts with hard
isolation. Add CoreRegistry, holding a lazy, LRU Map<session_key, TdaiCore>
with one dataDir per account (baseDir/{safeAccountDir(key)}), so L1/L2/L3
recall, search, and persona are physically isolated per tenant. Thread
session_key through the search/recall request types, return prepend_context
from /recall, and add account listing plus a session-scoped namespace wipe
for hard-delete. Gated by TDAI_MULTI_TENANT; single-core behavior is
unchanged when off.

Signed-off-by: Jack <278171810@qq.com>
Add a local dev-console (scripts/dev-console) that proxies the gateway and
renders the per-account memory pyramid for manual recall/search testing,
plus scripts/import-psydt.ts to bulk-seed PsyDTCorpus accounts into the
multi-tenant store. Wire both up as npm scripts (dev-console, import-psydt).

Signed-off-by: Jack <278171810@qq.com>
Add the multi-tenant design doc and upstream issue under docs/, project
guidance in CLAUDE.md, and a tdai-gateway.yaml sample that references
${DASHSCOPE_API_KEY} for embedding config without embedding secrets.

Signed-off-by: Jack <278171810@qq.com>
Structural multi-tenant isolation is physical: each account gets its own
dataDir + SQLite file. The TCVDB backend ignores dataDir and routes every
per-account core to one shared database/collection set, while the structural
route does not push session_key into L1/L0 search — so multiTenant=true +
storeBackend=tcvdb silently returns other accounts' memories from /recall and
/search/memories.

loadGatewayConfig now throws on that exact combination so the leak surfaces as
a startup failure instead of cross-tenant data exposure. Single-tenant + TCVDB
(one shared core, one database) and multi-tenant + SQLite are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
/seed writes to a shared baseDir/seed-<ts> snapshot dir, not the per-account
store under baseDir/{account} that recall/search read. In multi-tenant mode a
"successful" seed (200, l0_recorded > 0) is therefore invisible to every core —
a silent no-op that misleads backfill callers (scripts/import-psydt.ts already
documents this and works around it).

Return 400 in multi-tenant mode, pointing operators at the per-account seeding
path (executeSeed into registry.resolveDataDir(session_key)). Single-tenant
/seed is unchanged. Covered by the multi-tenant HTTP e2e test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
…live request

The multi-tenant LRU evictor (maxResidentCores) could pick a core that another
in-flight request was still using: getCore() returned the core, then a different
account's request triggered evictLruIfNeeded → core.destroy(), closing the
SQLite handle underneath the first handler. Only triggered when maxResidentCores
was set below peak concurrent accounts, but then it surfaced as a
capture/recall/search hitting a closed store.

Add a lease/refcount: registry.acquire() pins a core (pins++) and returns a
release(); request handlers hold the lease for the whole call and release it in
a finally. Eviction skips any core with pins > 0 (allowing a transient
over-limit rather than killing a live request), and manual evict()/wipe() now
defer teardown until in-flight leases drain. release() is idempotent. getCore()
stays pin-free for health/eager-warmup/tests. Health reports resident.pinned.

Covered by three new registry tests (no eviction under lease, wipe defers to
drain, idempotent double-release); existing LRU + e2e suites still pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
baseOverrides only set server/data, so loadGatewayConfig() still loaded the
repo-root tdai-gateway.yaml from CWD — enabling DashScope embedding. On a
machine with DASHSCOPE_API_KEY set the suite would silently exercise real
embedding/network instead of the keyword/FTS path it asserts on (the comment
already claimed "provider none" but nothing enforced it).

Pass memory: parseConfig({ extraction:{enabled:false}, embedding:{provider:
"none"}, recall:{strategy:"keyword"} }) so the tests are self-contained.
Also drops e2e wall-time (no embedding init attempts).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
Multi-tenant /health hardcoded stores.{vectorStore,embeddingService}=false
(cores are lazy/per-account, so there's no single store to probe), leaving
operators no way to tell whether vector recall is even configured — the exact
blind spot that made "is embedding wired?" un-answerable from the API.

Add an `embedding` block to /health (both modes) derived from config, not a
network probe (health must stay a cheap liveness check): `configured` is true
only when embedding is enabled, the provider isn't the "none" sentinel, and the
config has no error, plus provider/model/dimensions and the recall strategy.
The live "did vectors actually fire" signal remains the `strategy` field on
/search/memories. Also surfaces resident.pinned (active leases).

e2e asserts the field under the hermetic provider=none/keyword boot; integration
guide updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
maxResidentCores is a count bound: a long tail of accounts that each go quiet
still linger until an LRU push. Add coreIdleTtlMs (env TDAI_CORE_IDLE_TTL_MS /
yaml data.coreIdleTtlMs) — a periodic sweep evicts any unpinned core idle longer
than the TTL, reclaiming memory during quiet periods. Complements the count
bound with a time bound.

Default 0 = disabled (no behaviour change); multi-tenant only. The sweep timer
is unref()'d so it never holds the process open, and is cleared on destroyAll.
Pinned (in-flight) cores are always spared, consistent with the lease refcount.
Covered by four registry tests (disabled no-op, idle evicted / fresh spared,
pinned spared then reclaimed, background timer reclaims).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
Two known-issue annotations, no behaviour change:

- persona.md is written by multiple stages on different SerialQueues (L3
  persona-generator, L2 scene-nav, tcvdb profile-sync). atomicWriteFile guards
  torn reads but NOT lost updates, and the L3 path mutates the file via a ~180s
  LLM run, so a correct fix needs LLM-to-staging + a per-account lock — too big
  for a one-line guard. Annotate both local write sites so the next change knows
  the shape of the proper fix; tcvdb+multiTenant (the extra writer) is already
  rejected at config load.

- conversation-search applies session_key as a POST-filter over topK. Clarify
  that it's a structural no-op in multi-tenant (each core's store is single
  account) but REQUIRED in single-tenant shared-store mode, where topK dilution
  can drop results — the real fix being session pushdown into the L0 query.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
…f scope

Spell out in the integration guide that embedding (DashScope text-embedding-v3)
and storage (local SQLite + sqlite-vec + FTS5) are independent layers and the
only supported multi-tenant stack. Note that the tcvdb backend is a separate
design with its own server-side embedding and shared cloud collections, that
multiTenant + tcvdb is rejected at startup by design, and therefore the
"tcvdb breaks isolation" issue does not apply to a DashScope+SQLite deployment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
persona.md is written by two background stages on separate SerialQueues —
L3 PersonaGenerator.generateLocalPersona (read → ~180s LLM tool-write →
final write) and L2 SceneExtractor.updateSceneNavigation (read → strip →
append → write). Both are read-modify-write; atomicWriteFile prevents a
torn read but not a lost update, so an interleaving L2 nav write can
clobber a freshly regenerated L3 body (or vice-versa). This race is live
in the supported DashScope+SQLite multi-tenant stack (the profile-sync
writer is tcvdb-only, and tcvdb+multiTenant is rejected at config load).

Add KeyedAsyncMutex (per-key FIFO async mutex, error-isolated, in-process)
and route both writers' whole RMW through the shared fileWriteMutex keyed
by persona.md's absolute path. Method B: the L3 critical section spans the
full LLM run because the LLM writes the file mid-run. Per-account
isolation is automatic (distinct paths never contend); recall reads take
no lock, so the read hot path is unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
scripts/smoke-recall.mjs validates a running gateway end-to-end against
its real embedding provider (DashScope): GET /health (gates fast on
embedding.configured=false), POST /capture a distinctive fact, poll
POST /search/memories with a no-shared-keyword paraphrase until an L1
atom is recalled, and assert strategy is hybrid/embedding (vectors
fired) rather than fts/none. Cleans up via /namespace/wipe in
multi-tenant mode.

Dependency-free Node ESM (global fetch) — no build/tsx needed. Exit 0
PASS / 1 FAIL / 2 setup-error, with actionable diagnostics that
distinguish "L1 never formed" from "formed but embedding not
contributing". Probe text overridable via SMOKE_* env for non-Chinese
or domain-specific deployments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
@Maxwell-Code07

Copy link
Copy Markdown
Collaborator

We appreciate your contribution. Our team will conduct an internal evaluation of this PR and get back to you shortly.

Point integration teams at scripts/smoke-recall.mjs (commit 6b0a573) as the
first post-deploy check: it automates and asserts the §6 vector-recall test
and is wired into the §10 quickstart checklist.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Jack <278171810@qq.com>
@huanshanxiaoyao

Copy link
Copy Markdown
Author

We appreciate your contribution. Our team will conduct an internal evaluation of this PR and get back to you shortly.

Thanks,

希望能被采纳,或者共同交付一个多租户可用的版本

Lock the multi-tenant fork's deploy state for AI4ALL production:

- config.ts: default `multiTenant` to true so the fork is multi-tenant by
  default even if TDAI_MULTI_TENANT is unset (prod still sets it explicitly).
- dev-console: surface recall `strategy` in search results (hybrid/embedding/
  fts/none) to distinguish vector recall from keyword at a glance.
- .gitignore: ignore throwaway `scripts/tmp_*.py` analysis scripts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants