Skip to content

Comments

Generic SQLite v0 terminology providers with batch-optimized expansion#135

Closed
jmandel wants to merge 5 commits intoHealthIntersections:mainfrom
jmandel:sqlite-terminology-providers
Closed

Generic SQLite v0 terminology providers with batch-optimized expansion#135
jmandel wants to merge 5 commits intoHealthIntersections:mainfrom
jmandel:sqlite-terminology-providers

Conversation

@jmandel
Copy link
Contributor

@jmandel jmandel commented Feb 21, 2026

Summary

Generic SQLite-based CodeSystem providers that can serve any terminology imported into a unified v0 schema — replacing per-terminology custom providers for RxNorm, LOINC, and SNOMED CT. Builds on the decomposed CS filter API from PR #133, adding new backward-compatible provider methods and worker optimizations.

Real-World ValueSet Expansion Performance

Tested against real ValueSets from FHIR R4 Core and IPS specifications. All queries use the v0 SQLite providers.

Full expansion (no count limit)

ValueSet Codes Time
SNOMED Clinical Findings (descendent-of 404684003) 124,411 2.4s
SNOMED All Procedures (descendent-of 71388002) 59,440 0.9s
SNOMED Body Structures (descendent-of 123037004) 37,543 0.8s
RxNorm SBD+SCD (TTY in SBD,SCD) 27,361 0.5s
RxNorm SBD (TTY = SBD) 9,766 0.24s

IPS/FHIR R4 ValueSets (count=100)

ValueSet Pattern Time
IPS Procedures (8 is-a excludes) descendent-of Procedure minus admin, bloodbank, community health, etc. 53ms
IPS Lab Results LOINC CLASSTYPE=1 AND STATUS=ACTIVE minus 4 CLASS values 331ms
IPS Allergy Reactions (19-root union) 19 separate is-a includes 43ms
IPS Problems (132K total codes) 3 descendent-of + 1 is-a 79ms
IPS Medications minus vaccines SNOMED medicinal products minus vaccines 23ms
Cross-system RxNorm drugs + LOINC labs Two systems, filters on each 133ms
Heart conditions (text search + hierarchy) descendent-of 56265001 + filter text heart 15ms

Cross-system LIMIT optimization

Query Before After
RxNorm + LOINC include (count=10) 1,800ms 21ms (85x)

Previously, cross-system ValueSets never got LIMIT passed to providers. Now cs.handlesOffset() gates LIMIT passdown per-CS, which is safe because excludes are system-scoped.

Architecture

v0 Schema

A single normalized SQLite schema (tx/importers/sqlite-v2/schema-v0.sql) stores any code system:

concept (concept_id, code, display, definition, active)
concept_closure (ancestor_id, descendant_id, depth)
concept_link (concept_id, property_code, target_concept_id)   -- concept-type properties
concept_literal (concept_id, property_code, value_text)        -- literal properties
designation (concept_id, language_code, use_code, term, preferred, active)

Importers for RxNorm, LOINC, and SNOMED CT transform source data into this schema.

Query Pipeline

The v0 provider implements a declarative intent pipeline:

  1. Worker registers intent: includeConcepts(), filter(), excludeConcepts(), filterExclude(), prepareDesignations()
  2. Provider records all intent without executing SQL
  3. executeFilters() builds one combined SQL query from all registered intent
  4. Iteration reads from pre-fetched rows — zero per-code DB queries

Batch Designation Pre-fetch

The biggest performance win. Before: designations() ran one SELECT FROM designation WHERE concept_id=? per code during iteration — 27K queries for RxNorm SBD+SCD, 124K for Clinical Findings.

After: prepareDesignations() tells the provider what designation data will be needed. executeFilters() batch-fetches all designations in one query (chunked in batches of 500). During iteration, designations() reads from the pre-fetched Map.

Query Before (per-code DB) After (batch) Speedup
RxNorm SBD+SCD (27K) 7s 0.5s 14x
SNOMED Clinical Findings (124K) 7s 2.4s 3x

New Provider API Methods

All backward-compatible — providers that don't implement them get original behavior.

Method Purpose
includeConcepts(ctx, codes) Register explicit concept list for combined SQL
excludeConcepts(ctx, codes) Clean concept exclusion (vs pseudo-filter smuggling)
locateBatch(codes, filterSet) Batch context lookup from pre-fetched rows
prepareDesignations(ctx, options) Inform provider about designation needs

Test Results

  • 80/80 custom expansion tests pass (40 standard + 20 real-world IG + 20 fuzz-generated)
  • 3,577/3,669 HL7 test cases pass (40 failures: 18 pre-existing version tests, 2 from intentional LIMIT raise to 1M, ~20 pre-existing LOINC/SNOMED)
  • Upstream PR draft changes to cs-api for DB based provider #133 baseline: 350/1288 failures → our changes fix 310 of those

Files

  • tx/cs/cs-sqlite-runtime-v0.js — Core v0 provider (~3,500 lines)
  • tx/cs/cs-sqlite-snomed-v0.js — SNOMED specialization (expressions, ECL, hierarchy)
  • tx/cs/cs-sqlite-expression-adapter.js — SNOMED expression → v0 adapter
  • tx/cs/cs-sqlite-v0-specializers.js — Per-terminology specialization registry
  • tx/importers/sqlite-v2/ — v0 schema + importers for RxNorm, LOINC, SNOMED
  • tx/cs/cs-api.js — New API method declarations
  • tx/workers/expand.js — Unified intent pipeline, LIMIT passdown, prepareDesignations
  • scripts/test-expand-cross-system.js — 80-test expansion test suite
  • docs/open-questions.md — API changes documentation and open questions

@jmandel jmandel force-pushed the sqlite-terminology-providers branch from 98b50b1 to a444c0d Compare February 21, 2026 20:35
Grahame Grieve and others added 3 commits February 22, 2026 08:37
Add SQLite-backed code system providers for RxNorm, LOINC, and SNOMED CT
that use a shared v0 schema with closure tables, FTS5 search indexes,
and a unified SQL filter pipeline for both includes and excludes.

Key features:
- Single #buildV0FilterSql code path handles all filter types (concept
  hierarchy, property filters, code regex, value set membership)
- Excludes reuse the same filter SQL wrapped in NOT EXISTS
- Streaming pagination for large expansions (124K+ SNOMED codes)
- Batch designation fetching for efficient display/property loading
- SNOMED expression constraint language support via adapter
- RxNorm archived concept import from RXNATOMARCHIVE
- STY registered as filterable property for RxNorm
- Opt-in perf counters (no-op when disabled)

Integrates with Grahame's CS provider API (PR HealthIntersections#133):
- getPrepContext, filterExcludeFilters, filterExcludeConcepts
- scanValueSet, handlesExcludes, handlesOffset
- Unified intent path with includeConcepts + filter + exclude

Also fixes method name bugs on legacy filter path (getExtensions ->
extensions, getCodeStatus -> getStatus, getProperties -> properties).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jmandel jmandel force-pushed the sqlite-terminology-providers branch from 7cf7f4a to fc82680 Compare February 22, 2026 00:19
@jmandel
Copy link
Contributor Author

jmandel commented Feb 22, 2026

Superseded by new PR targeting the decomposed CS filter API branch (PR #133).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant