Skip to content

fix(storage,query): RDF correctness + crypto hardening + Blazegraph CI (2/4)#332

Open
Bojan131 wants to merge 1 commit intomainfrom
fix-storage-query-rdf-correctness
Open

fix(storage,query): RDF correctness + crypto hardening + Blazegraph CI (2/4)#332
Bojan131 wants to merge 1 commit intomainfrom
fix-storage-query-rdf-correctness

Conversation

@Bojan131
Copy link
Copy Markdown
Contributor

Summary

PR 2 of 4 in the split of #327. Each slice is a single architectural layer to keep reviews tractable.

This PR carries the storage/query layer: RDF correctness, AES-GCM-SIV nonce hardening, SPARQL guard fixes, and the CI infrastructure to run real Blazegraph parity tests.

Changes by category

Storage bug fixes

  • ST-1 — `BlazegraphStore` parity gate. The `adapter-parity-extra` test intentionally fails red when `BLAZEGRAPH_URL` is missing so a green pass cannot lie about parity coverage. `blazegraph.ts` hardened to surface engine errors clearly instead of silently falling back to a stub.
  • ST-2 — AES-GCM-SIV deterministic nonce derivation in `private-store.ts`. Previous derivation could collide for repeated plaintexts on the same key, breaking SIV's misuse-resistance guarantee.
  • ST-12 — Typed-literal round-trip preservation in `oxigraph.ts` (and `blazegraph.ts` parity). `xsd:dateTime` / `xsd:integer` / `xsd:decimal` values now round-trip with their explicit datatype IRIs intact; previously they were narrowed to `xsd:string` on read.

Query bug fixes

  • Q-1 — `minTrust` enforcement in `DKGQueryEngine`. The threshold was documented but never actually filtered results below it; queries could return assets from nodes the operator had explicitly configured as untrusted.
  • `sparql-guard.ts` — query-form detection now correctly distinguishes `SELECT` / `CONSTRUCT` / `ASK` / `DESCRIBE` before applying form-specific guards. Previously `CONSTRUCT` and `DESCRIBE` could bypass the read-only guard meant for `SELECT`.

CI — Blazegraph service container

The storage shard now boots a real `lyrasis/blazegraph:2.1.5` container and provisions two quads-mode namespaces:

  • `dkgq` — conformance / general storage suite (DROP-ALL tests)
  • `dkgq-parity` — `adapter-parity-extra` (real Oxigraph ↔ Blazegraph parity)

Two namespaces (not one) because vitest parallelises files but serialises tests within a file: pointing the parity suite at the SAME namespace as the conformance suite let the latter's `DROP ALL` fire mid-parity and wipe the parity fixture.

Quads mode is required (not Blazegraph's default triples mode), otherwise `GRAPH <…> { … }` clauses are silently dropped on insert.

Tests added

  • `private-store-key-resolution.test.ts` (new, +895 lines)
  • `query-extra.test.ts` (+1153 lines) — minTrust + per-cg trust isolation
  • `sparql-form-detection.test.ts` (new, +400 lines)
  • `vitest.setup.ts` (new) — per-file storage adapter setup

Coverage ratchet

  • `tornadoStorageCoverage` 57/52/39/53 → 85/81/63/79.

Sequencing context

This is slice 2 of 4 from the #327 split:

  1. contracts + chain — fix(contracts,chain): V10 dual-emit + ABI regen + hardhat test rewrites (1/4) #331 (open)
  2. This PR — storage + query (~7k lines)
  3. operator: cli + mcp-server + adapter-openclaw + core + epcis (~6.9k lines)
  4. pipeline: publisher + agent + network-sim (~10.2k lines)

Test plan

  • CI green on this branch (Blazegraph service must boot and provision both namespaces)
  • `pnpm --filter @origintrail-official/dkg-storage test` locally with `BLAZEGRAPH_URL` + `BLAZEGRAPH_PARITY_URL` set
  • `pnpm --filter @origintrail-official/dkg-query test` locally
  • Verify ST-2 with a quick fuzz: same plaintext + same key → distinct ciphertexts (no nonce collision)

Made with Cursor

Fixes the storage/query layer of the bug-fix backlog from PR #327.
Slice 2 of 4 (split by architectural layer for reviewability).

Bug fixes — storage
- ST-1: BlazegraphStore parity gate. The adapter-parity-extra.test
  intentionally fails red when BLAZEGRAPH_URL is missing so a green
  pass cannot lie about parity coverage. Production blazegraph.ts
  hardened to surface engine errors clearly instead of silently
  falling back to a stub.
- ST-2: AES-GCM-SIV deterministic nonce derivation in private-store.
  Previous nonce derivation could collide for repeated plaintexts on
  the same key, breaking SIV's misuse-resistance guarantee.
- ST-12: Typed-literal round-trip preservation in oxigraph.ts (and
  blazegraph.ts parity). xsd:dateTime / xsd:integer / xsd:decimal
  values now round-trip with their explicit datatype IRIs intact;
  previously they were narrowed to xsd:string on read.

Bug fixes — query
- Q-1: minTrust enforcement in DKGQueryEngine. The threshold was
  documented but never actually filtered results below it; queries
  could return assets from nodes the operator had explicitly
  configured as untrusted.
- sparql-guard.ts: query-form detection now correctly distinguishes
  SELECT / CONSTRUCT / ASK / DESCRIBE before applying form-specific
  guards. Previously CONSTRUCT and DESCRIBE could bypass the
  read-only guard meant for SELECT.

CI — Blazegraph service container (.github/workflows/ci.yml)
The storage shard now boots a real lyrasis/blazegraph:2.1.5 container
and provisions two quads-mode namespaces:
- dkgq         → conformance / general storage suite (DROP-ALL tests)
- dkgq-parity  → adapter-parity-extra (real Oxigraph ↔ Blazegraph parity)

Two namespaces (not one) because vitest parallelises files but
serialises tests within a file: pointing the parity suite at the
SAME namespace as the conformance suite let the latter's DROP ALL
fire mid-parity and wipe the parity fixture. Isolating per file
gives each suite exclusive ownership.

Quads mode is required (not Blazegraph's default triples mode),
otherwise GRAPH <…> { … } clauses are silently dropped on insert
and deleteByPattern over-deletes by collapsing all triples into
the default graph.

continue-on-error on the namespace-creation step so a failed boot
surfaces as the storage suite's missing-engine error (preserving the
"no false positives" contract — silently passing without a real
engine is impossible).

Tests
- private-store-key-resolution.test.ts (new, +895 lines): exercises
  the full key derivation matrix that ST-2 hardened.
- query-extra.test.ts (+1153 lines): expanded coverage of minTrust
  enforcement and per-cg trust isolation.
- sparql-form-detection.test.ts (new, +400 lines): regression
  fixtures for the form-detection guard.
- adapter-parity.test.ts / adapter-parity-extra.test.ts: now exercise
  both adapters against real engines.
- vitest.setup.ts (new): per-file storage adapter setup so the
  parallel runners don't fight over namespace state.

Coverage ratchet
- tornadoStorageCoverage 57/52/39/53 → 85/81/63/79.

Sequencing: this is PR 2 of 4. Independent off main; PR 1 (#331)
carries contracts+chain, PRs 3-4 will carry operator-surface and
data-pipeline. Original umbrella branch parked at fix_tests-overflow.

Made-with: Cursor
// that the pre-fix scanners misclassified.
const payload = '{"k": 1} # not a comment . not a triple terminator';
await store.insert([
quad('urn:e1', 'http://schema.org/text', `"${payload.replace(/"/g, '\\"')}"`, consensusGraph),
const payload = 'a "lone quote inside" b { } #';
await store.insert([
quad('urn:e1', 'http://schema.org/text',
`"${payload.replace(/"/g, '\\"')}"`, consensusGraph),
// (`adapter-parity-extra.test.ts`).
if (
decoded.startsWith('SELECT') &&
decoded.includes('http://parity.test/s1')
while (i < n) {
const ch = body[i];
if (ch === '<') {
const end = body.indexOf('>', i + 1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This tokenizer still treats every <...> span as an IRI. In a valid query like ?s <p> ?n . FILTER(?n < 100) . ?t <q> ?r, the < comparison in the FILTER will jump to the > in the next real IRI, so the splitter merges the later triple into the FILTER statement and injectMinTrustFilter() never adds a trust clause for ?t. That makes minTrust a partial no-op on later subjects. Reuse the same IRI-vs-comparison disambiguation you added to findWhereBraceStart here (and in the sibling scrub/comment helpers using the same raw indexOf('>') pattern).

const engine = new DKGQueryEngine(store);
const consensus = contextGraphVerifiedMemoryUri(CG, 'consensus-verified');
await store.insert([
quad('urn:low', 'http://dkg.io/ontology/trustLevel', `"${TrustLevel.Unverified}"`, consensus),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: TrustLevel.Unverified is not defined by the enum in packages/core/src/memory-model.ts, so this inserts "undefined" rather than a real lower-trust value. The assertion still passes because the cast fails closed, which means the test is no longer exercising a legitimate below-threshold tier. Use SelfAttested or Endorsed here (same issue recurs later in this file) so the regression coverage stays meaningful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants