Skip to content

Comments

SQLite v0 terminology providers with unified filter pipeline#136

Open
jmandel wants to merge 2 commits intoHealthIntersections:2026-02-gg-cs-api-proposalfrom
jmandel:sqlite-terminology-providers
Open

SQLite v0 terminology providers with unified filter pipeline#136
jmandel wants to merge 2 commits intoHealthIntersections:2026-02-gg-cs-api-proposalfrom
jmandel:sqlite-terminology-providers

Conversation

@jmandel
Copy link
Contributor

@jmandel jmandel commented Feb 22, 2026

Summary

Generic SQLite-based CodeSystem providers for RxNorm, LOINC, and SNOMED CT using a unified v0 schema. Builds on the decomposed CS filter API from PR #133 — single commit on top of c4d4590.

Supersedes #135 (which targeted main). Now rebased onto the 2026-02-gg-cs-api-proposal branch, adopting Grahame's API naming/signatures where they overlap with ours.

What's new vs #135

  • Unified include/exclude SQL — single #buildV0FilterSql code path handles all filter types for both includes and excludes (was duplicated before)
  • SNOMED hierarchy exclude fixconcept is a virtual property not in property_def; exclude validation now uses SQL trial instead of property lookup
  • RxNorm STY registered as filterable — cross-system STY exclude tests now pass
  • RxNorm archived concept import — reads RXNATOMARCHIVE for merged/retired RXCUIs as active=0
  • Method name fixes — fixed broken getExtensionsextensions, getCodeStatusgetStatus, getPropertiesproperties on legacy filter path
  • Consolidated _propsIfRequested — removed duplicate wrapper
  • Removed try-catch / _v0Excludes leak — if provider says handlesExcludes()=true, trust it

CS Provider Interface: What We Add Beyond PR #133

PR #133 introduces the decomposed filter API (filter()filterExcludeFilters()executeFilters()filterMore()/filterConcept()). We use all of those as-is. Here is what we add or change:

New method: includeConcepts(filterContext, codes)

Added to cs-api.js — no-op default, backward-compatible.

Why: PR #133 has no way for the provider to see explicit concept codes from compose.include[].concept. The worker handles those in a separate per-code locate() loop, completely outside the filter pipeline. This means the provider can't build one optimal SQL query that covers both concept enumeration and filters together.

What it does: Records intent to include specific codes. No SQL execution — executeFilters() incorporates these as WHERE code IN (...) in the combined query.

Worker side: The worker checks typeof cs.includeConcepts === 'function' before calling. If absent, falls back to the original per-code locate() loop.

Changed: handlesOffset() return value

In PR #133: handlesOffset() body is empty (returns undefined/falsy).
Our fix: Returns false explicitly. Our v0 provider returns true.

Worker side: We changed the LIMIT passdown gate from vsInfo.csDoOffset (only true for simple single-CS ValueSets) to cs.handlesOffset() (true for any provider that supports paging). This is safe because excludes are system-scoped — an exclude on system B can't drain results from system A. Result: cross-system ValueSet expansion dropped from ~4s to ~12ms with count=10.

Changed: unified intent block in worker

In PR #133: Separate if (cset.concept) and if (cset.filter) blocks — the provider never sees the complete picture.
Our change: When the provider supports includeConcepts, the worker creates one prep context and registers all intent (concepts + filters + excludes) before calling executeFilters() once. Falls back to the original separate-block behavior when includeConcepts is absent.

Changed: skip excludeCodes() iteration when provider handles excludes

In PR #133: handleCompose() always iterates all excluded codes via excludeCodes() → per-code isExcluded(), even when the provider's handlesExcludes() returned true and filterExcludeFilters() already registered them in SQL.
Our change: When csDoExcludes is true, skip the excludeCodes() iteration entirely. The provider handles excludes in its SQL.

Bug fix: method names on legacy filter path

Lines 778-779 called cs.getExtensions(c), cs.getCodeStatus(c), cs.getProperties(c) — none of which exist on any provider. The base class has extensions(), getStatus(), properties(). Fixed to use correct names.

Worker: listDisplaysFromProvider fast path

Added display fast path: when workingLanguages is set and designations aren't requested, use cs.display(context) directly instead of cs.designations(context, displays). Avoids per-code DB queries when only the primary display is needed.

Worker: designation batch pre-fetch via getPrepContext()

Grahame's updated getPrepContext(iterate, params, excludeInactive, offset, count) passes the full TxParameters. Our v0 provider reads params.includeDesignations, params.workingLanguages(), and params.designations to determine designation needs. executeFilters() then batch-fetches all designations in one query instead of per-code queries.

Architecture

Unified filter pipeline

Worker registers intent:
  includeConcepts() → filter() → filterExcludeConcepts() → filterExcludeFilters()

Provider records intent without executing SQL

executeFilters() builds one combined SQL query:
  SELECT ... FROM concept t
  JOIN concept_closure ...     -- hierarchy includes
  JOIN concept_literal ...     -- property includes  
  WHERE NOT EXISTS (...)       -- excludes (reuses same filter SQL)
  ORDER BY ... LIMIT/OFFSET

Iteration reads from pre-fetched rows — zero per-code DB queries

Batch designation pre-fetch

getPrepContext() tells the provider what designation data will be needed. executeFilters() batch-fetches all designations in one query. During iteration, designations() reads from a pre-fetched Map.

Performance

Full expansion (no count limit)

ValueSet Codes Time
SNOMED Clinical Findings (descendent-of 404684003) 124,411 2.4s
SNOMED All Procedures (descendent-of 71388002) 59,440 0.9s
SNOMED Body Structures (descendent-of 123037004) 37,543 0.8s
RxNorm SBD+SCD (TTY in SBD,SCD) 27,361 0.5s

IPS/FHIR R4 ValueSets (count=100)

ValueSet Pattern Time
IPS Procedures (8 is-a excludes) descendent-of Procedure minus 8 subtrees 251ms
IPS Lab Results LOINC CLASSTYPE=1 AND STATUS=ACTIVE minus 4 CLASS values 337ms
IPS Allergy Reactions (19-root union) 19 separate is-a includes 49ms
IPS Problems (132K total codes) 3 descendent-of + 1 is-a 71ms
IPS Medications minus vaccines SNOMED medicinal products minus vaccines 30ms
Cross-system RxNorm drugs + LOINC labs Two systems, filters on each 119ms
Heart conditions (text search + hierarchy) descendent-of 56265001 + filter text heart 44ms

Cross-system LIMIT optimization

Query Before After
RxNorm + LOINC include (count=10) 4,068ms 12ms (340x)

Test results

  • 40/40 standard expansion tests pass
  • 20/20 real-world IPS/FHIR R4 tests pass
  • 3,575/3,669 HL7 test cases pass (42 failures pre-existing upstream — timeouts and missing data)

Files

  • tx/cs/cs-api.jsincludeConcepts() + handlesOffset() fix
  • tx/cs/cs-sqlite-runtime-v0.js — Core v0 provider (~3,400 lines)
  • tx/cs/cs-sqlite-snomed-v0.js — SNOMED specialization (expressions, ECL, hierarchy)
  • tx/cs/cs-sqlite-expression-adapter.js — SNOMED expression → v0 adapter
  • tx/importers/sqlite-v2/ — v0 schema + importers for RxNorm, LOINC, SNOMED CT
  • tx/workers/expand.js — Unified intent pipeline, LIMIT passdown, bug fixes
  • scripts/test-expand-cross-system.js — 60-test expansion test suite
  • docs/open-questions.md — Open questions and resolved items

@jmandel jmandel force-pushed the sqlite-terminology-providers branch from fc82680 to fb01ee0 Compare February 22, 2026 00:24
Add SQLite-backed code system providers for RxNorm, LOINC, and SNOMED CT
that use a shared v0 schema with closure tables, FTS5 search indexes,
and a unified SQL filter pipeline for both includes and excludes.

Key features:
- Single #buildV0FilterSql code path handles all filter types (concept
  hierarchy, property filters, code regex, value set membership)
- Excludes reuse the same filter SQL wrapped in NOT EXISTS
- Streaming pagination for large expansions (124K+ SNOMED codes)
- Batch designation fetching for efficient display/property loading
- SNOMED expression constraint language support via adapter
- RxNorm archived concept import from RXNATOMARCHIVE
- STY registered as filterable property for RxNorm
- Opt-in perf counters (no-op when disabled)

Integrates with Grahame's CS provider API (PR HealthIntersections#133):
- getPrepContext, filterExcludeFilters, filterExcludeConcepts
- scanValueSet, handlesExcludes, handlesOffset
- Unified intent path with includeConcepts + filter + exclude

Also fixes method name bugs on legacy filter path (getExtensions ->
extensions, getCodeStatus -> getStatus, getProperties -> properties).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmandel jmandel force-pushed the sqlite-terminology-providers branch from fb01ee0 to 02d11bf Compare February 22, 2026 00:34
Add time-based query effort limiting for v0 SQLite terminology providers.
Uses sqlite3_progress_handler to interrupt queries exceeding a configurable
wall-clock time limit (default 5s, configurable via tx.effortLimitMs).

Uses a fork of better-sqlite3 (jmandel/better-sqlite3#progress-handler) that
exposes db.progressHandler(interval, callback). The fork is an optional
dependency — if native compilation fails (no build tools), falls back to
stock better-sqlite3 and queries run without effort limits.

The progress handler callback checks performance.now() every 10,000 VM
instructions (~0.2ms granularity). Queries exceeding the limit throw
SQLITE_INTERRUPT, which propagates as a standard error.

Config: set modules.tx.effortLimitMs in config.json (default: 5000ms).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmandel jmandel force-pushed the sqlite-terminology-providers branch from f3ae920 to b04fa69 Compare February 22, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant