Skip to content

feat: content-hash conversation identity (server-only hashing)#64

Merged
basilebong merged 10 commits into
mainfrom
feat/content-hash-conversations
May 22, 2026
Merged

feat: content-hash conversation identity (server-only hashing)#64
basilebong merged 10 commits into
mainfrom
feat/content-hash-conversations

Conversation

@basilebong
Copy link
Copy Markdown
Collaborator

@basilebong basilebong commented May 20, 2026

Summary

Conversation identity moves to a server-computed content hash of the canonical turn JSON (with sha256 of each RecordedAudio's WAV bytes substituted in). The dev sets a free-form name; renaming doesn't change identity, editing a turn or a WAV does. The SDK does zero hashing — no cross-language parity fixture to keep in sync.

This branch also carries main's #65 (server-as-analyzer: stereo WAV upload + VAD + turn derivation + SSE progress). The two surfaces are independent.

API request flow

sequenceDiagram
    autonumber
    participant SDK as Python SDK
    participant Xray as xray server
    participant Agent as Voice agent
    participant Worker as bunqueue worker

    SDK->>Xray: POST /v1/conversations multipart spec + audio parts
    Note over Xray: hash = sha256(canonical turns), upsert by hash
    Xray-->>SDK: 200 hash

    SDK->>Xray: POST /v1/replays JSON conversation_hash
    Xray-->>SDK: 201 id, lifecycle_state pending

    SDK->>Agent: runtime.run(conversation)
    Agent->>Xray: POST /v1/otlp/v1/traces xray.replay.id
    Agent-->>SDK: AgentResponses

    SDK->>Xray: POST /v1/replays/:id/audio stereo WAV
    SDK->>Xray: POST /v1/replays/:id/analyze
    Xray->>Worker: enqueue
    Xray-->>SDK: 202 job_id

    SDK->>Xray: GET /v1/replays/:id/events SSE
    Worker->>Xray: VAD, speech_segments, replay_turns
    Xray-->>SDK: state / progress / completed

    SDK->>Xray: GET /v1/replays/:id
    Xray-->>SDK: turns + tool_calls + model_usage + spans

    SDK->>Xray: PATCH /v1/replays/:id lifecycle_state
    Note over SDK,Xray: 409 tolerated — server-owns-lifecycle
    Xray-->>SDK: 200
Loading

What changed for callers

  • POST /v1/conversationsmultipart/form-data: spec JSON ({name, turns}, ≤256 KB measured in UTF-8 bytes) plus one file part per RecordedAudio turn, keyed by the turn's declared upload_key (any [A-Za-z0-9_.-] string, ≤50 MB each). Server hashes, returns the full conversation row ({hash, name, created_at, last_run_at, turns}).
  • POST /v1/replays — JSON {conversation_hash, run_config?}, returns the replay row at lifecycle_state: "pending".
  • conversations PK is hash; replays.conversation_hash FKs into it.
  • Audio paths: recorded conversation audio at <XRAY_AUDIO_ROOT>/recorded/<sha256>.wav (deduplicated across conversations, written via tmp + rename(2) so concurrent identical uploads never expose a partial file); full-replay stereo mixdown at <XRAY_AUDIO_ROOT>/<replay_id>/replay.wav.
  • Python SDK: Conversation(name=..., turns=...); xray.run(...) returns RunResult.conversation_hash. Existing data wiped (pre-1.0).
  • SDK resilience: a dev-authored judge raising no longer strands the replay row — orchestrator logs and proceeds to PATCH. The final PATCH tolerates 409, accepting the server's lifecycle when SSE wait drops before the terminal event.

Atomicity

Audio files are written content-addressed (recorded/<sha256>.wav) before the conversation upsert — a partial write + failed upsert leaves a harmless orphan that the next submission finds in place. Typed errors at every boundary: RecordedAudioUploadKeyError (missing/unreferenced upload_key), ConversationNotFoundError (404), ReplayLifecycleTransitionError (409 on terminal-state mutations).

Verify

pnpm typecheck && pnpm check && pnpm test     # 262 pass
cd sdk/python && uv run pyright && uv run pytest  # 42 pass

@LukasPoque LukasPoque self-requested a review May 20, 2026 19:18
@LukasPoque LukasPoque self-assigned this May 20, 2026
@basilebong basilebong changed the title feat: content-hash conversation identity feat: content-hash conversation identity (server-only hashing) May 21, 2026
basilebong added a commit that referenced this pull request May 21, 2026
Lifts the sequence diagram from PR #64 into the integration guide as a
top-of-doc "Request flow at a glance" section. Also fixes the residual
`conversation_id`/`conversation_version` references in architecture.md
and integrate.md that survived the content-hash merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
basilebong and others added 5 commits May 22, 2026 08:22
Replaces dev-set conversation IDs with a SHA-256 over the canonical-JSON
encoding of the turn array (including sha256 of RecordedAudio WAV bytes).
The dev sets `name` as a free-form display label; identity is the hash.

Wire: POST /v1/replays now carries {name, turns, modality, run_config?}.
The server recomputes the hash, upserts the conversation row by hash
(last-write-wins on name), and creates the replay. POST /v1/conversations
is removed. GET routes use :hash instead of :id; the ?version query is
gone. Replays reference conversations via conversation_hash FK.

SDK: Conversation(name=, turns=). compute_hash is a lazy cached_property.
RecordedAudio bytes are sha256'd at first hash access; the cache keys on
(path, mtime_ns, size). The bind/baggage/JWT-attribute pipeline carries
conversation_hash in place of (conversation_id, conversation_version).

A canonical-JSON parity fixture (tests/fixtures/hash-parity.json) pins
the byte-for-byte contract between the Python SDK and the TS server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stale APIs

- replays.service: wrap `ensureConversation` and replay/meta inserts in a
  single transaction so a failed insert can't leave an orphan conversation
  row polluting `last_run_at` ordering. `ensureConversation` now takes
  `StoreDbOrTx` so it composes from both contexts.
- Conversation: freeze the dataclass and drop `functools.cached_property`
  so editing a WAV (mtime changes) or mutating `turns` in place is
  reflected on the next `.hash` access. Disk-heavy work stays memoized in
  `_AUDIO_SHA256_CACHE` keyed by `(path, mtime_ns, size)`.
- `_sha256_file`: wrap `OSError` in `AudioMissingError` so the SDK's
  typed-errors contract holds at the call site.
- orchestrator: wrap `httpx.HTTPStatusError` on `POST /v1/replays` as a
  new typed `XrayServerError` — the failure happens before the replay
  row exists, so it can't flow through the `failure_reason` PATCH path.
- Delete dead `judge` plumbing from the orchestrator (`JudgePatchBody`,
  `_judge_to_wire`, `judge` field on `RunResult`, the unreachable PATCH
  branch). `Conversation.judge` is still accepted and ignored.
- Expand `tests/fixtures/hash-parity.json` from one shape to a 9-case
  vector: ASCII, unicode + emoji surrogate pair, control chars,
  U+2028/U+2029, U+007F DEL, empty text, recorded audio, TTS with/without
  voice_id. Both Python and TS parity tests iterate it.
- Move `shortHash` / `HASH_PREFIX_LEN` from `src/client/lib/format.ts`
  into the existing `src/client/format.ts`; delete the orphan `lib/`
  helper; add co-located tests.
- Drop stale SDK examples folder and clean stale APIs from the SDK
  README (`id=`, `title=`, `expect_agent_turn`, `xray.trace`,
  `POST /v1/conversations`) — a working example will land in a
  follow-up PR alongside a dev LiveKit instance.
- Fix `makeConversationTurn` so an `{role:"agent"}` override no longer
  carries the user-turn default `text:"hello"`.
- Fix stale `(id, version)` PK comment on `ConversationRow`.
- New tests: `XrayServerError` on POST failure, audio cap fires locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terminal status, numeric guard

Cleans up the open review threads on the content-hash branch:

- LiveKitDriver → LiveKitRuntime everywhere (README, docs, exports,
  tests, error messages). The class name in the quickstart now matches
  the implementation, so a copy-paste import resolves.
- README's wiring section now describes JWT participant attributes
  instead of the stale "room metadata" mechanism, and the OpenAI TTS
  cache description matches the actual (text, voice, model)
  fingerprint layout.
- Drop `path` from the RecordedAudio wire payload. The sha256 is the
  full identity; including the local filesystem path made the
  conversation hash machine-local (same checked-in spec produced
  different hashes on Alice's vs Bob's box). Mirrors in the TS schema,
  parity fixture, and the SDK's _audio_to_wire — and locked in by a
  new "same bytes, different path ⇒ same hash" test.
- Replay terminal-status guard now covers `completed` too, not just
  `failed`. A "rescue" PATCH that flips a completed run back to
  running silently rewrote the outcome; now both raise
  ReplayStatusTransitionError. Sibling test mirrors the existing
  `failed` coverage.
- Bound the orchestrator's XrayServerError message at
  e.response.text[:500] — matches the OTLP exporter's truncation and
  stops a 5xx HTML error page from dumping into the dev's stdout.
- corrupt turns_json in the conversations store now logs a warn line
  with the conversation hash and the underlying error/issues before
  returning []. Two tests pin both the parse-failure and
  schema-failure branches.
- Canonical encoder on both sides now rejects numeric values.
  JSON.stringify(1.0) is "1"; Python's json.dumps(1.0) is "1.0" — the
  hashes would silently diverge across languages the moment a numeric
  field landed in a turn. TS throws in canonicalStringify; the Python
  side scans the encoded output for unquoted numeric tokens (cheaper
  and less type-narrow-fighty than a typed tree walk). Booleans stay
  allowed (they roundtrip identically) and digits inside strings stay
  allowed (regression guard test for the scanner).
- ruff format/check fixes on test_real_server.py, test_orchestrator.py,
  test_conversation.py and runtime/livekit.py — what tripped CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /v1/replays becomes multipart/form-data: a `spec` JSON part
carries the Conversation (`name`, `turns`, `modality`, `run_config`),
and one named file part per `RecordedAudio` turn carries the raw WAV
bytes. The server reads each file part, sha256s the bytes, stores a
content-addressed copy under `<audioRoot>/recorded/<sha256>.wav`
(idempotent — same bytes ⇒ same file, written once via `flag: "wx"`),
substitutes the sha256 into the canonical turn JSON, then hashes the
canonical JSON to derive `conversation_hash`.

Why: the SDK shouldn't be a hash authority. Cross-language canonical
encoder parity (the hash-parity.json vector) was a brittle wire
contract that broke the moment either side's encoding drifted. With
the server as the sole hash authority, drift becomes impossible by
construction. RecordedAudio bytes are also now server-resident, so a
future PR can serve them back to the inspector for per-turn playback
without the dev needing to keep their local WAVs around.

SDK changes (sdk/python/):
- delete `Conversation.hash`, `_canonical_turns_json`,
  `_hash_turns_wire`, `_reject_numeric_tokens`, `_sha256_file`,
  `_AUDIO_SHA256_CACHE`
- rename `to_replay_create_payload` → `to_replay_spec_payload`;
  add `recorded_audio_uploads()` yielding `(upload_key, path)` pairs
- orchestrator POSTs multipart via httpx `files=`, opening file
  handles inside a `contextlib.ExitStack`
- `_ReplayCreateResponse` reads `conversation_hash` from the server

Server changes (src/server/):
- new request-form turn schemas with `{kind: "recorded", upload_key}`;
  canonical/stored schemas keep `{kind: "recorded", sha256}`
- `materializeRequestTurns` walks request turns, hashes bytes,
  substitutes sha256, validates no orphan or missing upload_keys
- one typed error `RecordedAudioUploadKeyError` with
  `reason: "missing" | "unreferenced"`; 400 either way
- `saveRecordedConversationAudio` writes content-addressed copies in
  parallel via `Promise.all`, swallowing `EEXIST` (same-bytes idempotency)
- replays router accepts multipart, caps spec at 256 KB and the whole
  body at 512 MB; audio bytes per part still capped at MAX_AUDIO_BYTES

Tests:
- delete tests/fixtures/hash-parity.json — cross-language parity vector
  has no consumers now
- new `createReplayForTest` helper supplies a session-wide temp audio
  root (cleaned up on process exit) and an empty audio-parts map
- new server tests: audio-bytes sha256 substitution; content-addressed
  on-disk write; 400 on missing / unreferenced upload_key

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@basilebong basilebong force-pushed the feat/content-hash-conversations branch from 96707f2 to d8d2123 Compare May 22, 2026 06:33
@basilebong
Copy link
Copy Markdown
Collaborator Author

A `final=True` transcription that arrives after one agent turn's audio
ends — e.g. a plugin's delayed `conversation_item_added` — would sit
in the shared queue and satisfy the next agent turn's `final_seen` on
entry, ending it before any new audio is captured. Drop pre-turn
segments before installing the drainer.

Regression test:
test_runtime_drains_stale_transcripts_between_agent_turns
…l hashing

- conversations.service: materializeOneTurn dispatch via ts-pattern; pre-hash
  audio bytes via Promise.all and return PendingAudioWrite[] so the router
  iterates pairs directly. ensureConversation switched to options object.
  listConversations projects explicit columns (drops turnsJson read).
- conversations.errors: add MissingSpecPartError (extends MalformedConversationBodyError);
  reuse ConversationHashSchema for the canonical recorded sha256 field.
- replays.service: ConversationHashNotFoundError dropped — throw the canonical
  ConversationNotFoundError from conversations.errors. enqueueAnalysis now
  throws ReplayNotFoundError (404) when the row vanished between claim and
  check, replacing the synthetic "unknown" lifecycle path. buildReplayDetail
  accepts a pre-fetched ReplayRow so create/update/get/compare drop one
  redundant SELECT each.
- replays.errors + audio.errors: ReplayNotReadyForAnalysisError.currentState
  and ReplayUploadStateError.currentState typed as ReplayLifecycleState.
- audio.types: compile-time exhaustiveness check on ALL_CONTENT_TYPES vs
  AudioContentType so a future content-type addition fails to compile
  without the corresponding picklist entry.
- test-utils: replays.test-utils routes conversation seeding through the
  conversations slice's seedConversation; conversations.router.test reuses
  makeTempAudioRoot.
- errors tests: add coverage for MissingSpecPartError, InvalidConversationRequestError,
  MalformedConversationBodyError, ConversationBodyTooLargeError per errors.md §5.
- sdk/python orchestrator: collapse the two _read_*_response helpers into a
  generic _read_response[T]; extract _raise_for_status_typed for the two
  HTTPStatusError → XrayServerError wraps.

Wire-visible changes:
- POST /v1/conversations missing-spec error now returns
  issues[0].type = "multipart_part" (was "json_body") with a clearer message.
- POST /v1/replays/:id/analyze returns 404 replay_not_found (was 409
  replay_not_ready_for_analysis with current_state: "unknown") when the row
  is deleted between the claim and the post-claim check.

Verify: pnpm typecheck && pnpm check && pnpm test (252 pass);
cd sdk/python && uv run pyright && uv run pytest (42 pass).
…9 PATCH, router tests

- scripts/seed.ts: restore the /v1/conversations multipart upsert step
  (broken since the multipart refactor) so /v1/replays has a hash to
  reference. Send /v1/replays as JSON {conversation_hash, run_config},
  not FormData. PATCH uses lifecycle_state + a valid failure_reason
  picklist value.
- orchestrator.py: wrap _evaluate_judge in try/except so a dev-authored
  judge raising can't strand the replay row pre-PATCH. Final PATCH
  tolerates 409 — when SSE wait drops out before the worker emits the
  terminal event, the server has already settled the lifecycle and its
  truth wins. Non-409 errors go through _raise_for_status_typed so the
  dev sees XrayServerError instead of a raw httpx exception.
- audio.service: saveRecordedConversationAudio now writes to a per-call
  .tmp-<uuid> file and rename(2)s atomically. The previous
  writeFile(flag:"wx") + EEXIST-as-success strategy returned success
  to the loser of a concurrent identical upload while the winner was
  still streaming bytes — partial-content window for any reader.
- conversations.router: count spec bytes via Buffer.byteLength(...,"utf8")
  instead of String.length. UTF-16 code units undercount multi-byte text
  by up to 4x against the BYTES-named cap; outer multipart limit kept
  the worst case bounded but the spec cap silently lied.
- conversations.router.test: cover POST /v1/conversations end-to-end
  (happy path text-only, recorded audio, idempotent hash, 400 cases
  missing/malformed/schema-invalid spec + upload_key missing/unreferenced,
  413 oversize, UTF-8 vs UTF-16 byte counting). afterEach cleans up
  temp audio root + store handles across the file.
Stale references from the content-hash rename were still present in the
narrative docs and a few code comments; the new 409-tolerance branch on
the final PATCH lacked test coverage after the SSE tests were removed.

- docs/architecture.md: drop VersionFingerprintMismatchError mentions,
  rewrite the POST /v1/conversations item to reflect the multipart +
  server-hashing flow, update the ER diagram (hash/name/last_run_at),
  update inspector endpoints to use :hash.
- docs/integrate.md: OTEL baggage list uses xray.conversation.hash
  instead of the removed .id/.version keys.
- conversations.service.ts: ensureConversation docstring no longer
  claims last_run_at is denormalized from MAX(replays.started_at) — it's
  set on every POST /v1/conversations, which is what the code does.
- replays.errors.ts: trim ReplayLifecycleTransitionError doc to what the
  error means; SDK-side editorial belongs in the PR description.
- orchestrator.py: rewrite the top-of-file step list to match the actual
  10-step flow, renumber the inline step comments sequentially (was
  1, 2, 2, 2b, 3, 4, 5b, 5c, 5, 6, 7, 8), update the cross-reference in
  the 409-tolerance comment.
- test_orchestrator.py: add tests for the PATCH-409 tolerance branch and
  its non-409-still-raises counterpart.
LukasPoque
LukasPoque previously approved these changes May 22, 2026
Copy link
Copy Markdown
Member

@LukasPoque LukasPoque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all review findings and committed them, @basilebong maybe you can make a review and then its fine to merge IMHO

@LukasPoque LukasPoque assigned basilebong and unassigned LukasPoque May 22, 2026
@basilebong basilebong merged commit ea5f002 into main May 22, 2026
6 checks passed
@basilebong basilebong deleted the feat/content-hash-conversations branch May 22, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants