feat(data): serve contiguous data by content digest#753
Conversation
Add GET|HEAD /ar-io/digest/:digest to serve contiguous data addressed by its SHA-256 content digest — the same base64url value emitted as the X-AR-IO-Digest response header and used as the on-disk cache key. The gateway already stores contiguous data content-addressed by this hash; this exposes it directly. Responses are inherently self-verifying (the bytes provably hash to the requested digest) so X-AR-IO-Verified is always true and Cache-Control is immutable. Local-cache only: there is no on-demand fetch by content hash, since Arweave/peers address by id, so an unknown digest is 404. For header parity with /raw, a representative id that resolves to the digest is looked up (cheap, via the existing contiguous_data_hash index) and run through the same setDataHeaders path, so the full id-scoped header set (X-AR-IO-Data-Id, tags, owner, signature, root offsets) is present and signed by the HTTPSIG middleware. The served digest is pinned onto the attributes so the digest/ETag/Content-Digest headers always describe the bytes streamed. - DB: selectDataAttributesByHash SQL + getDataAttributesByHash through the StandaloneSqlite worker/circuit-breaker/queue/handler chain (no migration — the contiguous_data_hash index already exists) - Data source: ReadThroughDataCache.getDataByHash + ByHashDataSource interface - Attributes: getDataAttributesByHash on DataAttributesSource + composite - Route: DIGEST_DATA_PATH_REGEX, createDigestDataHandler on arIoRouter; handleRangeRequest generalized with an optional region fetcher - Tests: handler (GET/HEAD/range/404/451/400/no-id), getDataByHash, DB method - Docs: openapi path + glossary "Content Digest (ar-io-digest)" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
🚧 Files skipped from review as they are similar to previous changes (7)
📝 WalkthroughWalkthroughThis PR implements content-addressed data serving by SHA-256 digest via a new ChangesContent-Addressed Digest Data Serving
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
src/routes/data/index.ts (1)
35-45: ⚡ Quick winUse TSDoc for the exported
digestDataHandlerdeclaration.Since this is a new exported handler, prefer a TSDoc block over inline comments to keep export-level documentation consistent.
As per coding guidelines,
src/**/*.ts: Add or improve TSDoc comments on code you touch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/routes/data/index.ts` around lines 35 - 45, Add a TSDoc block above the exported digestDataHandler declaration describing its purpose, route mounting (/ar-io/digest/:digest), and the key injected dependencies (createDigestDataHandler, dataSource, dataAttributesSource, dataBlockListValidator, rateLimiter, paymentProcessor, dataItemMetaResolver); replace the inline comment with this TSDoc and ensure it uses the /** ... */ format immediately above the export so documentation tools pick it up and it follows the repo's TSDoc conventions.src/constants.ts (1)
91-95: ⚡ Quick winAdd TSDoc to the new exported regex constant.
DIGEST_DATA_PATH_REGEXis exported and newly introduced; please convert the inline comments to a symbol-level TSDoc block for consistency with the file’s documented exports.As per coding guidelines,
src/**/*.ts: Add or improve TSDoc comments on code you touch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/constants.ts` around lines 91 - 95, Add a symbol-level TSDoc block above the exported constant DIGEST_DATA_PATH_REGEX that replaces the inline comments: describe that it matches content-addressed data using a base64url SHA-256 digest (43 chars), that this value is emitted as X-AR-IO-Digest, and that the /ar-io/digest/ prefix is distinct from /raw/:txid because a 43-char digest looks like a txid; leave the exported declaration export const DIGEST_DATA_PATH_REGEX = /^\/ar-io\/digest\/([a-zA-Z0-9-_]{43})\/?$/i; unchanged except for adding the TSDoc. Ensure the TSDoc uses proper /** ... */ format and is placed immediately above the constant.src/routes/data/handlers.ts (1)
1394-1416: ⚡ Quick winRemove the orphaned TSDoc block above
createDigestDataHandler.There are two consecutive doc blocks here; the first one is not attached to a symbol and reads like stale docs, which makes generated/internal docs ambiguous.
As per coding guidelines,
src/**/*.ts: Add or improve TSDoc comments on code you touch.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/routes/data/handlers.ts` around lines 1394 - 1416, Remove the stray/stale TSDoc block that sits immediately above createDigestDataHandler so only the single relevant doc block (the one attached to createDigestDataHandler) remains; delete the orphaned comment content and, if needed, consolidate any unique information into the remaining TSDoc for createDigestDataHandler (update that function's comment rather than keeping two consecutive blocks) and ensure the file follows the project's TSDoc guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/glossary.md`:
- Around line 226-237: Update the broken anchor in the "Content Digest
(ar-io-digest)" entry: change the link target from `#contiguous-data-store` to
the existing anchor `#contiguous-data` so the reference to the contiguous data
section resolves correctly (look for the "Content Digest (ar-io-digest)"
paragraph and its `[contiguous data store](`#contiguous-data-store`)` link).
In `@src/data/read-through-data-cache.test.ts`:
- Around line 208-215: The test for readThroughDataCache.getDataByHash currently
only asserts returned metadata; update the test to also assert that the
underlying dataStore.get was called with the expected region object (offset and
size) so the region is not dropped. Locate the call to
readThroughDataCache.getDataByHash in the test and add an assertion against the
mocked/spied dataStore.get (or equivalent spy used in this spec) verifying its
arguments include a region matching { offset: 0, size: 4 } (or that the
second/appropriate parameter contains that region), ensuring the store call
receives the region.
In `@src/data/read-through-data-cache.ts`:
- Around line 472-490: The success metric is being incremented too early; remove
the immediate call to metrics.getDataStreamSuccessesTotal.inc(...) and instead
attach a one-time listener to cacheStream (use cacheStream.once('end', ...)) to
call metrics.getDataStreamSuccessesTotal.inc with the same labels (class:
this.constructor.name, source: 'cache', request_type: requestType) when the
stream actually finishes; also attach a one-time 'error' listener to cacheStream
to avoid incrementing on failure (or increment a failure metric if available).
Ensure you handle the case the stream has already ended (check
cacheStream.readableEnded or equivalent) and call the metric immediately in that
case. Use the existing symbols cacheStream, requestType,
metrics.getDataStreamSuccessesTotal, attributes, region, and hash to locate and
implement the change.
In `@src/database/standalone-sqlite.ts`:
- Around line 1227-1242: Add a TSDoc comment to the public method
getDataAttributesByHash describing the function's contract: explain that the
input hash is expected to be a base64-url encoded string, that the
implementation decodes it (via fromB64Url) and will return undefined when no row
is found, and that the returned object contains base64-url encoded hash and id
fields (using toB64Url), a numeric size, and an optional contentType; also add
equivalent TSDoc to the other newly added "by-hash" methods to document
not-found behavior, encoding/decoding expectations, and exact return
shape/nullable fields.
In `@src/routes/ar-io.ts`:
- Around line 232-235: The new route arIoRouter.get(DIGEST_DATA_PATH_REGEX,
digestDataHandler) is exposing per-digest metric labels; add a normalizePath
mapping for the digest route so metrics use a fixed template (e.g.
"/ar-io/digest/:digest") instead of the raw digest value. Locate the metrics
path normalization configuration where includePath: true or path normalization
rules are defined and add an entry mapping the DIGEST_DATA_PATH_REGEX (or the
concrete route "/ar-io/digest/:digest") to the normalized template so the
arIoRouter.get route emits metrics under a stable label rather than per-digest
values.
---
Nitpick comments:
In `@src/constants.ts`:
- Around line 91-95: Add a symbol-level TSDoc block above the exported constant
DIGEST_DATA_PATH_REGEX that replaces the inline comments: describe that it
matches content-addressed data using a base64url SHA-256 digest (43 chars), that
this value is emitted as X-AR-IO-Digest, and that the /ar-io/digest/ prefix is
distinct from /raw/:txid because a 43-char digest looks like a txid; leave the
exported declaration export const DIGEST_DATA_PATH_REGEX =
/^\/ar-io\/digest\/([a-zA-Z0-9-_]{43})\/?$/i; unchanged except for adding the
TSDoc. Ensure the TSDoc uses proper /** ... */ format and is placed immediately
above the constant.
In `@src/routes/data/handlers.ts`:
- Around line 1394-1416: Remove the stray/stale TSDoc block that sits
immediately above createDigestDataHandler so only the single relevant doc block
(the one attached to createDigestDataHandler) remains; delete the orphaned
comment content and, if needed, consolidate any unique information into the
remaining TSDoc for createDigestDataHandler (update that function's comment
rather than keeping two consecutive blocks) and ensure the file follows the
project's TSDoc guidelines.
In `@src/routes/data/index.ts`:
- Around line 35-45: Add a TSDoc block above the exported digestDataHandler
declaration describing its purpose, route mounting (/ar-io/digest/:digest), and
the key injected dependencies (createDigestDataHandler, dataSource,
dataAttributesSource, dataBlockListValidator, rateLimiter, paymentProcessor,
dataItemMetaResolver); replace the inline comment with this TSDoc and ensure it
uses the /** ... */ format immediately above the export so documentation tools
pick it up and it follows the repo's TSDoc conventions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c9ecfca2-aac3-4fe1-a6d7-e45cf58943bc
📒 Files selected for processing (18)
CLAUDE.mddocker-compose.yamldocs/glossary.mddocs/openapi.yamlmonitoring/grafana/provisioning/dashboards/dashboards.yamlsrc/constants.tssrc/data/composite-data-attributes-source.tssrc/data/read-through-data-cache.test.tssrc/data/read-through-data-cache.tssrc/database/sql/data/content-attributes.sqlsrc/database/standalone-sqlite.test.tssrc/database/standalone-sqlite.tssrc/metrics.tssrc/routes/ar-io.tssrc/routes/data/handlers.test.tssrc/routes/data/handlers.tssrc/routes/data/index.tssrc/types.d.ts
| <a id="content-digest"></a> **Content Digest (ar-io-digest)** - The SHA-256 | ||
| hash of a piece of contiguous data, base64url-encoded. It is emitted on data | ||
| responses as the `X-AR-IO-Digest` header and is the key under which the | ||
| [contiguous data store](#contiguous-data-store) addresses bytes on disk | ||
| (`data/<h0:2>/<h2:4>/<hash>`). Because the same value identifies content | ||
| across the cache, the index, and the response header, it doubles as a stable | ||
| content address. The `GET /ar-io/digest/{digest}` endpoint serves bytes | ||
| directly by this value; such responses are inherently self-verifying (the | ||
| bytes provably hash to the requested digest) and immutable, but local-cache | ||
| only — there is no on-demand fetch by content hash, since Arweave addresses | ||
| data by [item ID](#item-id), not by content hash. | ||
|
|
There was a problem hiding this comment.
Fix broken link fragment.
The anchor reference at line 229 uses #contiguous-data-store, but the actual anchor at line 99 is <a id="contiguous-data">. Update the link to #contiguous-data to match the existing anchor.
🔗 Proposed fix
<a id="content-digest"></a> **Content Digest (ar-io-digest)** - The SHA-256
hash of a piece of contiguous data, base64url-encoded. It is emitted on data
responses as the `X-AR-IO-Digest` header and is the key under which the
-[contiguous data store](`#contiguous-data-store`) addresses bytes on disk
+[contiguous data store](`#contiguous-data`) addresses bytes on disk
(`data/<h0:2>/<h2:4>/<hash>`). Because the same value identifies content📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <a id="content-digest"></a> **Content Digest (ar-io-digest)** - The SHA-256 | |
| hash of a piece of contiguous data, base64url-encoded. It is emitted on data | |
| responses as the `X-AR-IO-Digest` header and is the key under which the | |
| [contiguous data store](#contiguous-data-store) addresses bytes on disk | |
| (`data/<h0:2>/<h2:4>/<hash>`). Because the same value identifies content | |
| across the cache, the index, and the response header, it doubles as a stable | |
| content address. The `GET /ar-io/digest/{digest}` endpoint serves bytes | |
| directly by this value; such responses are inherently self-verifying (the | |
| bytes provably hash to the requested digest) and immutable, but local-cache | |
| only — there is no on-demand fetch by content hash, since Arweave addresses | |
| data by [item ID](#item-id), not by content hash. | |
| <a id="content-digest"></a> **Content Digest (ar-io-digest)** - The SHA-256 | |
| hash of a piece of contiguous data, base64url-encoded. It is emitted on data | |
| responses as the `X-AR-IO-Digest` header and is the key under which the | |
| [contiguous data store](`#contiguous-data`) addresses bytes on disk | |
| (`data/<h0:2>/<h2:4>/<hash>`). Because the same value identifies content | |
| across the cache, the index, and the response header, it doubles as a stable | |
| content address. The `GET /ar-io/digest/{digest}` endpoint serves bytes | |
| directly by this value; such responses are inherently self-verifying (the | |
| bytes provably hash to the requested digest) and immutable, but local-cache | |
| only — there is no on-demand fetch by content hash, since Arweave addresses | |
| data by [item ID](`#item-id`), not by content hash. |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 229-229: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/glossary.md` around lines 226 - 237, Update the broken anchor in the
"Content Digest (ar-io-digest)" entry: change the link target from
`#contiguous-data-store` to the existing anchor `#contiguous-data` so the
reference to the contiguous data section resolves correctly (look for the
"Content Digest (ar-io-digest)" paragraph and its `[contiguous data
store](`#contiguous-data-store`)` link).
| it('honors a byte region', async () => { | ||
| const result = await readThroughDataCache.getDataByHash('knownHash', { | ||
| offset: 0, | ||
| size: 4, | ||
| }); | ||
| assert.equal(result.size, 4); | ||
| assert.equal(result.totalSize, 100); | ||
| }); |
There was a problem hiding this comment.
Assert that the region is passed to the store call.
This test currently validates return metadata only; it can still pass if region is accidentally dropped before dataStore.get().
🧪 Suggested assertion hardening
it('honors a byte region', async () => {
+ let receivedRegion: { offset: number; size: number } | undefined;
+ mock.method(
+ mockContiguousDataStore,
+ 'get',
+ async (hash: string, region?: { offset: number; size: number }) => {
+ if (hash === 'knownHash') {
+ receivedRegion = region;
+ const stream = new Readable();
+ stream.push('simulated data');
+ stream.push(null);
+ return stream;
+ }
+ return undefined;
+ },
+ );
+
const result = await readThroughDataCache.getDataByHash('knownHash', {
offset: 0,
size: 4,
});
assert.equal(result.size, 4);
assert.equal(result.totalSize, 100);
+ assert.deepEqual(receivedRegion, { offset: 0, size: 4 });
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('honors a byte region', async () => { | |
| const result = await readThroughDataCache.getDataByHash('knownHash', { | |
| offset: 0, | |
| size: 4, | |
| }); | |
| assert.equal(result.size, 4); | |
| assert.equal(result.totalSize, 100); | |
| }); | |
| it('honors a byte region', async () => { | |
| let receivedRegion: { offset: number; size: number } | undefined; | |
| mock.method( | |
| mockContiguousDataStore, | |
| 'get', | |
| async (hash: string, region?: { offset: number; size: number }) => { | |
| if (hash === 'knownHash') { | |
| receivedRegion = region; | |
| const stream = new Readable(); | |
| stream.push('simulated data'); | |
| stream.push(null); | |
| return stream; | |
| } | |
| return undefined; | |
| }, | |
| ); | |
| const result = await readThroughDataCache.getDataByHash('knownHash', { | |
| offset: 0, | |
| size: 4, | |
| }); | |
| assert.equal(result.size, 4); | |
| assert.equal(result.totalSize, 100); | |
| assert.deepEqual(receivedRegion, { offset: 0, size: 4 }); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/data/read-through-data-cache.test.ts` around lines 208 - 215, The test
for readThroughDataCache.getDataByHash currently only asserts returned metadata;
update the test to also assert that the underlying dataStore.get was called with
the expected region object (offset and size) so the region is not dropped.
Locate the call to readThroughDataCache.getDataByHash in the test and add an
assertion against the mocked/spied dataStore.get (or equivalent spy used in this
spec) verifying its arguments include a region matching { offset: 0, size: 4 }
(or that the second/appropriate parameter contains that region), ensuring the
store call receives the region.
| const requestType = region !== undefined ? 'range' : 'full'; | ||
| metrics.getDataStreamSuccessesTotal.inc({ | ||
| class: this.constructor.name, | ||
| source: 'cache', | ||
| request_type: requestType, | ||
| }); | ||
|
|
||
| const totalSize = attributes.size; | ||
| return { | ||
| hash, | ||
| stream: cacheStream, | ||
| size: region?.size ?? totalSize, | ||
| totalSize, | ||
| sourceContentType: attributes.contentType, | ||
| // Content-addressed: the bytes provably hash to the requested digest. | ||
| verified: true, | ||
| trusted: true, | ||
| cached: true, | ||
| }; |
There was a problem hiding this comment.
Defer success metrics until the stream actually completes.
getDataByHash() increments getDataStreamSuccessesTotal before bytes are consumed. If the stream errors later, this path will still report success.
📈 Suggested fix
const requestType = region !== undefined ? 'range' : 'full';
- metrics.getDataStreamSuccessesTotal.inc({
- class: this.constructor.name,
- source: 'cache',
- request_type: requestType,
- });
+ cacheStream.once('error', () => {
+ metrics.getDataStreamErrorsTotal.inc({
+ class: this.constructor.name,
+ source: 'cache',
+ request_type: requestType,
+ });
+ });
+ cacheStream.once('end', () => {
+ metrics.getDataStreamSuccessesTotal.inc({
+ class: this.constructor.name,
+ source: 'cache',
+ request_type: requestType,
+ });
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const requestType = region !== undefined ? 'range' : 'full'; | |
| metrics.getDataStreamSuccessesTotal.inc({ | |
| class: this.constructor.name, | |
| source: 'cache', | |
| request_type: requestType, | |
| }); | |
| const totalSize = attributes.size; | |
| return { | |
| hash, | |
| stream: cacheStream, | |
| size: region?.size ?? totalSize, | |
| totalSize, | |
| sourceContentType: attributes.contentType, | |
| // Content-addressed: the bytes provably hash to the requested digest. | |
| verified: true, | |
| trusted: true, | |
| cached: true, | |
| }; | |
| const requestType = region !== undefined ? 'range' : 'full'; | |
| cacheStream.once('error', () => { | |
| metrics.getDataStreamErrorsTotal.inc({ | |
| class: this.constructor.name, | |
| source: 'cache', | |
| request_type: requestType, | |
| }); | |
| }); | |
| cacheStream.once('end', () => { | |
| metrics.getDataStreamSuccessesTotal.inc({ | |
| class: this.constructor.name, | |
| source: 'cache', | |
| request_type: requestType, | |
| }); | |
| }); | |
| const totalSize = attributes.size; | |
| return { | |
| hash, | |
| stream: cacheStream, | |
| size: region?.size ?? totalSize, | |
| totalSize, | |
| sourceContentType: attributes.contentType, | |
| // Content-addressed: the bytes provably hash to the requested digest. | |
| verified: true, | |
| trusted: true, | |
| cached: true, | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/data/read-through-data-cache.ts` around lines 472 - 490, The success
metric is being incremented too early; remove the immediate call to
metrics.getDataStreamSuccessesTotal.inc(...) and instead attach a one-time
listener to cacheStream (use cacheStream.once('end', ...)) to call
metrics.getDataStreamSuccessesTotal.inc with the same labels (class:
this.constructor.name, source: 'cache', request_type: requestType) when the
stream actually finishes; also attach a one-time 'error' listener to cacheStream
to avoid incrementing on failure (or increment a failure metric if available).
Ensure you handle the case the stream has already ended (check
cacheStream.readableEnded or equivalent) and call the metric immediately in that
case. Use the existing symbols cacheStream, requestType,
metrics.getDataStreamSuccessesTotal, attributes, region, and hash to locate and
implement the change.
| getDataAttributesByHash(hash: string) { | ||
| const row = this.stmts.data.selectDataAttributesByHash.get({ | ||
| hash: fromB64Url(hash), | ||
| }); | ||
|
|
||
| if (row === undefined) { | ||
| return undefined; | ||
| } | ||
|
|
||
| return { | ||
| hash: row.hash ? toB64Url(row.hash) : hash, | ||
| size: row.data_size, | ||
| contentType: row.original_source_content_type ?? undefined, | ||
| id: row.id ? toB64Url(row.id) : undefined, | ||
| }; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add TSDoc for newly added by-hash methods.
Both new getDataAttributesByHash methods are touched public API surface but lack TSDoc comments. Please document contract/return semantics (not-found behavior and hash format assumptions).
As per coding guidelines: "src/**/*.ts: Add or improve TSDoc comments on code you touch".
Also applies to: 3505-3513
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/database/standalone-sqlite.ts` around lines 1227 - 1242, Add a TSDoc
comment to the public method getDataAttributesByHash describing the function's
contract: explain that the input hash is expected to be a base64-url encoded
string, that the implementation decodes it (via fromB64Url) and will return
undefined when no row is found, and that the returned object contains base64-url
encoded hash and id fields (using toB64Url), a numeric size, and an optional
contentType; also add equivalent TSDoc to the other newly added "by-hash"
methods to document not-found behavior, encoding/decoding expectations, and
exact return shape/nullable fields.
| // Content-addressed data: serve bytes by their SHA-256 digest (the value | ||
| // emitted as X-AR-IO-Digest). GET registration also answers HEAD. Local | ||
| // content store only — see createDigestDataHandler. | ||
| arIoRouter.get(DIGEST_DATA_PATH_REGEX, digestDataHandler); |
There was a problem hiding this comment.
Normalize /ar-io/digest/:digest in metrics path mapping to prevent label-cardinality blowup.
Adding this route without a normalizePath rule causes per-digest metric labels (includePath: true), which can grow unbounded and degrade metrics stability.
💡 Suggested fix
if (path.startsWith('/ar-io/')) {
if (path === '/ar-io/healthcheck') return path;
if (path === '/ar-io/info') return path;
if (path === '/ar-io/peers') return path;
+ if (path.match(/^\/ar-io\/digest\/[a-zA-Z0-9_-]{43}\/?$/))
+ return '/ar-io/digest/:digest';
if (path.match(/^\/ar-io\/resolver\/[^/]+$/))
return '/ar-io/resolver/:name';🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/routes/ar-io.ts` around lines 232 - 235, The new route
arIoRouter.get(DIGEST_DATA_PATH_REGEX, digestDataHandler) is exposing per-digest
metric labels; add a normalizePath mapping for the digest route so metrics use a
fixed template (e.g. "/ar-io/digest/:digest") instead of the raw digest value.
Locate the metrics path normalization configuration where includePath: true or
path normalization rules are defined and add an entry mapping the
DIGEST_DATA_PATH_REGEX (or the concrete route "/ar-io/digest/:digest") to the
normalized template so the arIoRouter.get route emits metrics under a stable
label rather than per-digest values.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #753 +/- ##
===========================================
+ Coverage 79.23% 79.28% +0.05%
===========================================
Files 126 126
Lines 46653 46767 +114
Branches 3556 3571 +15
===========================================
+ Hits 36966 37080 +114
+ Misses 9633 9632 -1
- Partials 54 55 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- selectDataAttributesByHash: ORDER BY verified DESC, trusted DESC, id ASC so when several ids share a hash (byte-identical content, different signed envelopes) the representative is deterministic and prefers the strongest provenance, rather than depending on index iteration order. - Collapse the per-request double lookup: getDataByHash now returns the representative id (ByHashData), so the handler no longer issues its own getDataAttributesByHash call. Removed the now-unused method from DataAttributesSource / CompositeDataAttributesSource (kept on ContiguousDataIndex, which getDataByHash uses). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5f3411b to
4bf33dd
Compare
What
Adds
GET|HEAD /ar-io/digest/:digest— serve contiguous data addressed by its SHA-256 content digest, the same base64url value the gateway already emits as theX-AR-IO-Digestresponse header and uses as its on-disk cache key.The content store is already content-addressed by this hash; this PR exposes it directly as a first-class, gateway-namespaced endpoint.
Why
Lets clients fetch (and dedup/verify) bytes by content identity rather than by transaction/data-item id. Because the bytes provably hash to the requested digest, the endpoint is inherently self-verifying in a way
/raw/:txidcan't be for peer-sourced bytes.Semantics
X-AR-IO-Verified: truealways.Cache-Control: public, max-age=…, immutable(the URL is the hash of the bytes)./raw: a representative id that resolves to the digest (cheap lookup via the existingcontiguous_data_hashindex) is run through the samesetDataHeaderspath, so the response carries the full id-scoped header set (X-AR-IO-Data-Id, tags, owner, signature, root offsets) — all then covered by the HTTPSIG signature. The served digest is pinned onto the attributes so digest/ETag/Content-Digest always describe the bytes actually streamed.451), and malformed-digest (400) all handled.Design notes
/ar-io/digest/:digest(not/raw/by-digest/…)./rawand/{txid}are the cross-gateway Arweave-standard namespace; content-hash serving is AR.IO-specific, so it lives under the/ar-io/*namespace and mirrors theX-AR-IO-Digestheader name for round-trip symmetry.contiguous_data_hashindex already exists; this only adds a read path.Changes
selectDataAttributesByHash+getDataAttributesByHashthrough the StandaloneSqlite worker / circuit-breaker / queue / message-handler chainReadThroughDataCache.getDataByHash+ByHashDataSourceinterfacegetDataAttributesByHashonDataAttributesSource+CompositeDataAttributesSourceDIGEST_DATA_PATH_REGEX,createDigestDataHandleronarIoRouter;handleRangeRequestgeneralized with an optional region fetcheropenapi.yamlpath + glossary entryTests
getDataAttributesByHash(DB),getDataByHash(data source), and the handler (GET / HEAD / range / 404 / 451 / 400 / no-representative-id) — all green.Note
A second commit (
chore: incidental working-tree tweaks) bundles three unrelated working-tree changes (docker-composeTX_FETCHER_WORKER_COUNTpassthrough, grafanafoldersFromFilesStructure, a CLAUDE.md note) that were present in the checkout. Happy to drop them into their own PR if preferred.Follow-ups (not in this PR)
digestfield +transactions(digest:)filter (cross-DB; harder)🤖 Generated with Claude Code