Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions docs/open-questions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# Open Questions

## Batch Designation Fetching

**Status**: Implemented via `getPrepContext(iterate, params, ...)` — designation hints extracted from `params`
**Discovered**: 2026-02-21, during v0 query profiling
**Resolved**: 2026-02-21, batch pre-fetch in `executeFilters()`

### Problem

The worker iterates each included code individually, calling `designations(context)` per code, which runs a separate `SELECT FROM designation WHERE concept_id=?` query for each. For RxNorm SBD+SCD (27K codes), that's 27K individual queries; for SNOMED Clinical Findings (124K codes), 124K queries. This was the dominant cost in large expansions.

The root cause: the worker's `listDisplaysFromProvider()` has a fast path (use `display` from context, no DB) and a full path (call `designations()`, per-code DB query). The fast path requires `workingLanguages` to be truthy. When no language header is sent (the default), every code takes the full path.

### Solution

Added designation hint extraction to `getPrepContext()` — when the worker passes `params` (TxParameters), the v0 provider reads `params.includeDesignations`, `params.workingLanguages()`, and `params.designations` to determine what designation data will be needed during iteration. The provider batch-fetches all designations in one query (batched in chunks of 500) and attaches them to the filter set. During iteration, `designations()` reads from the pre-fetched data instead of hitting the DB.

### Results

| Query | Before (per-code DB) | After (batch) | Speedup |
|-------|---------------------|---------------|---------|
| RxNorm SBD (9.7K codes) | ~2.2s | 0.24s | **9x** |
| RxNorm SBD+SCD (27K codes) | ~7s | 0.53s | **14x** |
| SNOMED Clinical Findings (124K codes) | ~7s | 2.4s | **3x** |

## Worker-to-Provider Limit Passdown

**Status**: Implemented
**Discovered**: 2026-02-21, during v0 query profiling
**Resolved**: 2026-02-21, expanded LIMIT passdown to all providers

### Problem

The worker's `getPrepContext(iterate, offset, count)` only passed offset/count when `vsInfo.csDoOffset` was true — which required the ValueSet to be "simple" (single code system). Cross-system ValueSets never got LIMIT, so providers fetched all matching rows even when the client asked for 10.

### Solution

Changed the gate from `vsInfo.csDoOffset` (only true for simple single-CS ValueSets) to `cs.handlesOffset()` (true for any provider that supports offset/count). Per-CS LIMIT is safe for cross-system ValueSets because excludes are system-scoped — an exclude on system B can't drain results from system A. The worker's overall count management handles cross-system totals.

### Result

Cross-system ValueSet expansion (e.g., RxNorm + LOINC include) dropped from ~1800ms to ~21ms (85x faster) with `count=10`.

## Active vs Inactive Concepts in v0 Databases

**Status**: Partially resolved — query-time filtering works; RxNorm re-import needed for STY + archived concepts
**Discovered**: 2026-02-21, during v0 vs upstream baseline comparison

### What works now

All three importers import inactive concepts with `active=0`:
- **RxNorm**: Suppressed atoms (SUPPRESS='O'|'E') → `active=0`. Non-suppressed → `active=1`.
- **SNOMED**: RF2 `active` column mapped directly. Both active and inactive concepts imported.
- **LOINC**: STATUS column mapped via `isActiveLoincStatus()`. Deprecated/Discouraged → `active=0`.

The v0 provider combines `excludeInactive` (from `ValueSet.compose.inactive=false`) and `params.activeOnly` (from `$expand?activeOnly=true`) in `getPrepContext()`. "Exclude" wins — client can narrow but not widen. Applied as `AND c.active = 1` in SQL.

### Remaining issues

1. **RxNorm archived/merged concepts not imported**: Concepts that were merged (e.g., `197385→1668032`) only exist in `RXNATOMARCHIVE`. The v0 importer only reads `RXNCONSO`. An explicit code enumeration of a retired RXCUI won't find it. Fix: import from RXNATOMARCHIVE as `active=0` concepts.

2. **RxNorm STY not registered as filterable**: The `runtime.filters.properties.byCode` only lists TTY. STY exists in `property_def` and has data in `concept_literal`, but the provider rejects it as a filter. Fix: add STY to `byCode` in the importer and re-import.

3. **SNOMED skips inactive relationships** (import-snomed-v0.js line 556): `if (!active) continue` skips inactive RF2 relationships. This means hierarchy edges that are only present as inactive won't appear in the closure table. Whether this matters depends on whether FHIR `is-a` filters should traverse inactive edges.

## SQLite Effort-Based Query Breaker

**Status**: Open — needs upstream contribution or workaround
**Discovered**: 2026-02-21, during regex filter implementation

### Problem

A single malicious or pathological query (e.g., a catastrophic-backtracking regex, or a huge cross-join) can block the Node.js event loop indefinitely because better-sqlite3 executes synchronously. There's no way to interrupt a running query from JavaScript once it starts.

### Root Cause

SQLite's C API provides `sqlite3_progress_handler(db, N, callback, context)` which invokes a callback every N virtual machine instructions. If the callback returns non-zero, the statement is interrupted with `SQLITE_INTERRUPT`. This is the standard way to implement query timeouts or effort limits in SQLite.

However, better-sqlite3 does not expose this API. The native C++ addon (`database.cpp`) never calls `sqlite3_progress_handler`.

### Current Mitigation

- REGEXP function has an effort counter (500k evaluation limit) that throws on exceeded
- Row iteration in `executeFilters()` could add a row-count cap

### Proper Solution

Contribute a PR to [better-sqlite3](https://github.com/WiseLibs/better-sqlite3) exposing `sqlite3_progress_handler` — e.g., `db.setProgressHandler(stepInterval, callback)`. This would allow:

```js
const MAX_STEPS = 10_000_000; // ~10M VM instructions
let steps = 0;
syncDb.setProgressHandler(1000, () => {
if (++steps > MAX_STEPS) return 1; // interrupt
return 0;
});
```

### Alternatives

- Use a worker_thread for SQLite queries with a timeout on the main thread
- Fork better-sqlite3 and add `sqlite3_progress_handler` binding
- Switch to a different SQLite binding that exposes it (e.g., `sql.js` via Wasm, but that has its own perf tradeoffs)

## CS Provider Interface Changes Beyond PR #133

**Status**: Documenting — these changes should be proposed upstream
**Discovered**: 2026-02-21, during v0 provider implementation

### Context

Grahame's PR #133 decomposed the CS provider API into a filter pipeline: `filter()` → `filterExcludeFilters()`/`filterExcludeConcepts()` → `executeFilters()` → `filterMore()`/`filterConcept()`. His updated branch also passes `params` (TxParameters) to `getPrepContext()`, giving the provider access to operation parameters. Our v0 provider builds on this but required additional interface methods. All changes are backward-compatible.

### New Provider Methods

#### 1. `includeConcepts(filterContext, codes: string[])`

**Why**: PR #133 has no way for the provider to know about explicit concept lists from `compose.include[].concept`. The worker handles these in a separate per-code `locate()` loop, completely outside the filter pipeline. This means the provider can't batch-optimize concept lookups (one SQL query vs N).

**What it does**: Records intent to include specific codes. No SQL execution — `executeFilters()` incorporates these as `WHERE code IN (...)` in the combined query.

**Backward compatibility**: The worker checks `typeof cs.includeConcepts === 'function'` before calling. If absent, falls back to the original per-code `locate()` loop.

### Worker Changes

#### Unified concept + filter block

**Why**: PR #133's worker has separate `if (cset.concept)` and `if (cset.filter)` blocks. The provider never sees the complete picture. This prevents the provider from building one optimal query.

**What changed**: When the provider supports `includeConcepts`, the worker creates one prep context and registers all intent (concepts + filters + excludes) before calling `executeFilters()` once. It then iterates concept-list results first (preserving compose ordering + designation merging), then filter results (deduping codes already emitted from the concept list).

**Backward compatibility**: When the provider doesn't support `includeConcepts`, the worker falls back to the original separate-block behavior.

#### Skip `excludeCodes()` when `csDoExcludes` is true

**Why**: The worker's `handleCompose()` iterated all excluded codes individually via `excludeCodes()` → `isExcluded()` per code — even when the provider's `handlesExcludes()` returned true and `filterExcludeFilters()`/`executeFilters()` handled excludes in SQL. For 14k+ excluded codes at ~0.05ms each, this added ~700ms of pure overhead.

**What changed**: When `csDoExcludes` is true, `handleCompose()` skips the `excludeCodes()` iteration. If `filterExcludeFilters()` throws (unsupported exclude), the catch block runs `excludeCodes()` at that point as fallback.

#### LIMIT passdown for cross-system ValueSets

**Why**: PR #133's worker only passes `offset`/`count` to `getPrepContext()` when `vsInfo.csDoOffset` is true — which requires the ValueSet to be "simple" (single code system). Cross-system ValueSets (multiple includes with different systems) never got LIMIT, so every provider fetched its full result set even when the client requested `count=10`.

**What changed**: The gate was changed from `vsInfo.csDoOffset` to `cs.handlesOffset()`. Per-CS LIMIT is safe because excludes are system-scoped (an exclude on system B can't drain results from system A), and the worker's overall count management handles cross-system totals.

**Result**: Cross-system ValueSet expansion dropped from ~1800ms to ~21ms (85x) with `count=10`.

#### Designation batch pre-fetch via `getPrepContext()`

**Why**: The worker's `listDisplaysFromProvider()` calls `designations(context)` per code. For v0 providers this means one DB query per code. The provider had no way to know upfront whether designations would be needed.

**What changed**: Grahame's updated `getPrepContext(iterate, params, excludeInactive, offset, count)` passes the full `TxParameters`. The v0 provider reads `params.includeDesignations`, `params.workingLanguages()`, and `params.designations` to determine designation needs, then `executeFilters()` batch-fetches them.

#### Active-only filtering

The v0 provider combines two sources of "exclude inactive" logic in `getPrepContext()`:
- `excludeInactive` (from `ValueSet.compose.inactive = false`)
- `params.activeOnly` (from `$expand?activeOnly=true`)

Per FHIR semantics, "exclude" wins — the client can narrow but can't widen beyond the ValueSet's rule. The combined flag is applied as `AND c.active = 1` in SQL.

### Summary Table

| Method | PR #133 | Our addition | Why |
|--------|---------|--------------|-----|
| `getPrepContext()` | ✅ (updated signature) | Use `params` for designation hints + `excludeInactive` for active filtering | Subsumes `prepareDesignations()` |
| `filter()` | ✅ | Modified: pure intent, no SQL | Defer all execution to `executeFilters()` |
| `filterExcludeFilters()` | ✅ | Unchanged | — |
| `filterExcludeConcepts()` | ✅ | Unchanged | — |
| `executeFilters()` | ✅ | Modified: builds combined SQL from all intent + batch designation pre-fetch | One query instead of separate materialization |
| `includeConcepts()` | ❌ | New | Provider needs to see concept lists for batch SQL |

## Expansion Size Limits (UPPER_LIMIT / INTERNAL_LIMIT)

**Status**: Open — needs production tuning
**Changed**: 2026-02-21, raised from 100K to 1M for development/testing

### Background

The worker has three constants that cap expansion size:
- `UPPER_LIMIT_NO_TEXT` — max codes returned when no text filter is provided
- `UPPER_LIMIT_TEXT` — max codes returned with a text filter
- `INTERNAL_LIMIT` — hard internal cap passed to sub-expansions

Upstream had these at 100,000. This blocks legitimate real-world ValueSets like FHIR R4 Clinical Findings (124K codes) and IPS Problems (132K codes) with a `too-costly` error.

We raised all three to 1,000,000 for development to avoid failing on these cases. With our SQL-backed provider and batch designation pre-fetch, 124K codes expand in ~2.4s (down from ~7s before batch designations).

### Questions

1. **What should production limits be?** Options:
- **Keep 1M**: allows any realistic FHIR ValueSet. Memory concern: 124K codes × ~1KB each ≈ 120MB per response.
- **Make configurable**: server config or per-request parameter. The FHIR `count` parameter already handles client-side limiting.
- **Tiered**: higher limit for SQL-backed providers (fast), lower for legacy iterate-based providers (slow).

2. **Should the limit apply to SQL row fetch or only to response size?** Currently the SQL uses `LIMIT` from the `count` parameter, but the worker's `limitCount` check applies to the accumulated `fullList` after iteration. With SQL providers, we could let SQL return all rows and only cap the response.

3. **Memory pressure**: a 1M-code expansion would be ~1GB of JSON. Should we add a memory-based check instead of (or in addition to) a count-based one?
Loading
Loading