Skip to content

Client-supplied root TX ID and path hints for data item resolution #627

@djwhitt

Description

@djwhitt

Client-supplied root TX ID and path hints for data item resolution

Problem

When a client requests a data item, the gateway must resolve it to its root L1 transaction before it can fetch the data. This resolution goes through multiple sources (local DB, CDB64 indexes, Turbo API, GraphQL, gateway HEAD requests) via CompositeRootTxIndex. These sources can fail (item not yet indexed) or return incorrect/stale values (e.g., a re-bundled item pointing to an old root).

In many cases, the client already knows the correct root transaction ID (and possibly the full nesting path) because it created or recently interacted with the data. Allowing the client to supply this as a hint — tried first, before server-side lookups — would provide both a fast path for correctly-hinted requests and a correction mechanism when server-side indexes are wrong.

Context

Current resolution flow

Request for data item ID
  -> RootParentDataSource.getData()
    -> Try attributes traversal (walk parentId chain in local attribute store)
    -> If incomplete, fall back to:
        CompositeRootTxIndex.getRootTx()
          -> DB -> CDB64 -> Turbo -> GraphQL -> Gateways
        Then Ans104OffsetSource to find byte offset in root bundle
    -> Fetch data from root TX at calculated offset
    -> Cache offsets for next time

What the hint provides

The DataItemRootIndex.getRootTx() interface returns:

{
  rootTxId: string;          // The L1 transaction containing the bundle
  path?: string[];           // [root, nestedBundle1, ..., parentBundle]
  rootOffset?: number;       // Byte offset to data item start
  rootDataOffset?: number;   // Byte offset to data item payload
  contentType?: string;
  size?: number;
  dataSize?: number;
}

A client hint would supply rootTxId and optionally path. The gateway would then:

  1. Use Ans104OffsetSource to verify the data item exists at the claimed location (bundle parsing)
  2. If a path is provided, use faster path-guided navigation instead of linear search
  3. Cache the discovered offsets for future requests (same as today)

The hint only tells the gateway where to look — it doesn't bypass any verification. The bundle must actually contain the claimed data item.

Requirements

Must Have

  • Accept a root transaction ID hint via request header (e.g., X-AR-IO-Root-Transaction-Id)
  • Accept an optional nesting path hint via request header (e.g., X-AR-IO-Root-Path, comma-separated TX IDs from root to immediate parent)
  • Hints are tried first, before attributes traversal and composite index lookup — this allows clients to override incorrect server-side values
  • If the hint fails verification (data item not found in the hinted bundle), fall through to normal resolution
  • The gateway still verifies the data item exists in the claimed root bundle (via Ans104OffsetSource bundle parsing)
  • Discovered offsets are cached in the data attributes store for future requests (no hint needed on subsequent requests)
  • Works for /raw/:id and /:id data endpoints
  • Invalid hints (malformed TX IDs, non-existent root TX, data item not found in bundle) are silently ignored — normal resolution continues
  • CLI tool (tools/fetch-with-hint) that resolves the root TX via GraphQL and sends a request with hint headers

Should Have

  • When a path hint is provided, use Ans104OffsetSource.getDataItemOffsetWithPath() for faster lookup
  • When only a root TX ID is provided (no path), use Ans104OffsetSource.getDataItemOffset() (linear search within root bundle)
  • Metrics to track hint usage: root_tx_hint_total{status=used|invalid|not_needed|not_provided} — how often hints are provided, how often they lead to successful resolution, and how often they fail and fall through

Nice to Have

  • Query parameter alternatives (?rootTxId=...&rootPath=id1,id2,id3) for environments where custom headers are difficult
  • Rate limiting on hint-based lookups to prevent abuse (client could cause the gateway to fetch arbitrary large bundles)
  • Cap on root bundle size for hint-based fetches

Design

Threading the hint through the stack

The hint information flows through RequestAttributes, which already propagates through every ContiguousDataSource.getData() call:

// In src/types.d.ts
export interface RequestAttributes {
  hops: number;
  arnsName?: string;
  arnsBasename?: string;
  // ... existing fields ...
  rootTransactionIdHint?: string;
  rootPathHint?: string[];
}

This avoids changes to the ContiguousDataSource.getData() signature or intermediate data sources (ReadThroughDataCache, SequentialDataSource, etc.) — they pass requestAttributes through unchanged.

Where the hint is consumed

In RootParentDataSource.getData(), the hint is tried first:

1. NEW: Check requestAttributes for hint   -> if present, try to resolve via bundle parsing
   - Valid hint verified? Use it, cache offsets, done
   - Hint failed? Log and continue to step 2
2. Try attributes-based traversal          -> success? done
3. Try CompositeRootTxIndex.getRootTx()    -> success? use offsets/parse bundle, done
4. Throw "not found" error                 -> same as today

Trying the hint first means:

  • Correct hints skip all server-side lookups entirely (fast path)
  • Incorrect server-side data can be overridden by a client that knows the right root TX
  • Failed hints add one extra round of bundle parsing before falling through — acceptable cost since the hint is opt-in

Step 1 reuses the existing Ans104OffsetSource code path that's already used in step 3 when the composite index returns a rootTxId without offsets.

CLI tool: tools/fetch-with-hint

A shell script that resolves a data item's root TX via GraphQL, then fetches it from a gateway with hint headers attached. This serves as both a debugging tool and a reference implementation for clients.

Usage:

# Fetch a data item with automatic root TX resolution
tools/fetch-with-hint <data-item-id> [options]

# Options:
#   --gateway <url>     Gateway to fetch from (default: http://localhost:4000)
#   --graphql <url>     GraphQL endpoint for root TX resolution (default: https://arweave.net/graphql)
#   --output <file>     Write response body to file (default: stdout)
#   --verbose           Show resolution details and response headers

What it does:

  1. Queries the GraphQL endpoint to walk the bundledIn chain from the data item to its root L1 transaction, collecting the full path
  2. Sends a GET /raw/<id> request to the gateway with:
    • X-AR-IO-Root-Transaction-Id: <rootTxId>
    • X-AR-IO-Root-Path: <root>,<bundle1>,...,<parent> (if nested)
  3. Outputs the response (or writes to file)

Example:

# Resolve and fetch a data item that the local gateway can't find
$ tools/fetch-with-hint abc123... --verbose
Resolving root TX via https://arweave.net/graphql...
  abc123... -> bundledIn -> parent456...
  parent456... -> bundledIn -> root789... (L1 transaction)
  Path: root789...,parent456...
Fetching from http://localhost:4000/raw/abc123...
  X-AR-IO-Root-Transaction-Id: root789...
  X-AR-IO-Root-Path: root789...,parent456...
HTTP/1.1 200 OK
Content-Type: application/octet-stream
...

GraphQL queries used:

# Walk the bundledIn chain
query getBundleParent($id: ID!) {
  transaction(id: $id) {
    id
    bundledIn { id }
  }
}

The tool iterates this query, following bundledIn.id until it reaches a transaction with no bundledIn (the root L1 transaction). This is the same traversal logic used by GraphQLRootTxIndex.

Security considerations

  • The hint only directs where to look — the gateway still verifies the data item exists via bundle parsing
  • The main risk is resource waste: a malicious client could supply hints pointing to large bundles that don't contain the item, causing the gateway to download and scan them
  • Mitigation: rate limiting on hint-based lookups, and/or a cap on root bundle size for hinted fetches

Files to Modify

File Change
src/types.d.ts Add rootTransactionIdHint? and rootPathHint? to RequestAttributes
src/routes/data/handlers.ts Parse hint headers from request into RequestAttributes
src/data/root-parent-data-source.ts Try hint first in getData() before attributes traversal and composite index
src/metrics.ts Add hint usage metrics
tools/fetch-with-hint CLI tool for resolving root TX via GraphQL and fetching with hints
Tests for gateway changes

No changes needed to CompositeRootTxIndex, individual root TX index sources, Ans104OffsetSource, ReadThroughDataCache, or system.ts.

Acceptance Criteria

  • Request without hint headers behaves identically to today
  • Request with valid X-AR-IO-Root-Transaction-Id resolves a data item, skipping server-side lookups
  • Request with valid hint overrides stale/incorrect server-side index data
  • Request with both root TX ID and path hints uses path-guided navigation (verified via logs/metrics)
  • Invalid hint (wrong root TX ID, item not in bundle) silently falls through to normal resolution
  • Malformed hint headers (invalid base64url, wrong length) are silently ignored
  • Successfully resolved hints cache offsets — subsequent requests for the same ID resolve without the hint
  • Metrics track hint usage (provided, used successfully, invalid, not provided)
  • tools/fetch-with-hint resolves root TX via GraphQL and fetches data with correct hint headers
  • tools/fetch-with-hint handles nested bundles (multi-level path)
  • tools/fetch-with-hint handles L1 transactions (no nesting) gracefully
  • Unit tests for RootParentDataSource hint-first fallback path
  • Unit tests for header parsing in route handlers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions