Open-source replay/eval framework for LiveKit voice agents. One Docker image, one SQLite file, one Python SDK.
Alpha. The wire and SDK API can break between minor versions. Upgrading from a previous release wipes your data: delete
/data/xray.dbbefore starting the new container. Issues and feedback are the most useful contribution right now.
- Author a Conversation in Python — an ordered list of user-side turns, per-turn assertion predicates, and an optional per-replay LLM judge.
- Run it against your LiveKit voice agent. The SDK joins your room as a user-side participant, plays the user audio, captures the agent's audio + transcript.
- xray records the run as a Replay. The dev's agent emits OpenTelemetry spans during the run — xray's OTLP receiver routes them by
xray.replay.idand surfaces tool calls, model usage, and timings in the inspector. Spans of recognized vocabularies (xray.*, OTel GenAI semconvgen_ai.*, Langfuse) light up automatically. - Compare runs side-by-side. Pick 2–8 Replays of one Conversation to grid-compare; pick two Conversations to align by per-turn
keyand see what diverged.
The image is published to GHCR:
docker pull ghcr.io/xray-eval/xray:0.0.1-alphaTagged releases are signed with cosign keyless (OIDC). To verify:
cosign verify ghcr.io/xray-eval/xray:<tag> \
--certificate-identity-regexp 'https://github.com/xray-eval/xray/' \
--certificate-oidc-issuer https://token.actions.githubusercontent.comOr build from source:
git clone https://github.com/xray-eval/xray.git
cd xray
docker build -t xray:local .The Python SDK:
pip install xray-py[livekit]Drop xray into your existing compose stack alongside your LiveKit agent:
# compose.yaml
services:
xray:
image: ghcr.io/xray-eval/xray:0.0.1-alpha
ports:
- "127.0.0.1:8080:8080" # bind to localhost only — see Security below
volumes:
- xray-data:/data # SQLite + audio survive container restarts
my-voice-agent:
build: .
environment:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://xray:8080/v1/otlp/v1/traces
OTEL_EXPORTER_OTLP_PROTOCOL: http/json
depends_on:
- xray
volumes:
xray-data:docker compose up, then open http://localhost:8080. The API reference is at http://localhost:8080/docs.
from xray import Conversation, Turn, expect_agent_turn, run
from xray.conversation import AgentResponse
from xray.runtime.livekit import LiveKitRuntime
import os
conv = Conversation(
id="booking-happy-path",
turns=[
Turn.user("Hi, I'd like to book a table for two at 7pm.", key="u0"),
expect_agent_turn(
key="a0",
assertion=lambda agent: "confirmed" in agent.transcript.lower(),
assertion_name="confirms_booking",
),
],
)
runtime = LiveKitRuntime(
url=os.environ["LIVEKIT_URL"],
api_key=os.environ["LIVEKIT_API_KEY"],
api_secret=os.environ["LIVEKIT_API_SECRET"],
room="booking-test-room",
)
result = run(
conversation=conv,
runtime=runtime,
xray_url="http://localhost:8080",
run_config={"model": "gpt-4o", "temperature": 0.5},
)
print(f"replay: http://localhost:8080/replays/{result.id}")The dev's agent reads xray.replay.id (plus conversation.id / version / modality) from LiveKit room metadata and propagates them as OTEL baggage so every span — xray.*, gen_ai.*, Langfuse — gets routed to the right Replay. See docs/SDK.md.
- Replays of the same Conversation: select 2–8 from the Conversation detail page → grid view with per-column
run_configheaders. - Two Conversations: pick from the Conversations index → side-by-side aligned by per-turn
key. Unmatched turns render as labeled "no matching turn" placeholders.
One Bun process serves both the SPA and the API. One SQLite file at /data/xray.db on a mounted volume. No external database, no second container, no managed service. See .claude/rules/single-image-distribution.md.
┌─ xray-py SDK on dev's machine ───────────────────────────────────────┐
│ POST /v1/conversations (idempotent upsert by (id, version)) │
│ POST /v1/replays → returns replay_id │
│ LiveKitRuntime joins room, plays user audio │
│ PATCH /v1/replays/:id (final status + judge result) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─ dev's agent ─────────────────────────────────────────────────────────┐
│ reads replay.id from room metadata → OTEL baggage │
│ emits xray.* / gen_ai.* / langfuse spans │
└─────────────────────────────────────────────────────────────────────┘
│ OTLP/JSON
▼
┌────────────────────────┐
│ SQLite /data/xray.db │ single file, mounted volume
└────────────────────────┘
│
▼
┌────────────────────────┐
│ UI │ Conversations · Replays · Compare
└────────────────────────┘
- The SDK→xray surface has no auth. xray and your agent are expected to live in the same Docker network. Do not expose port 8080 publicly. The default compose snippet above binds to
127.0.0.1. - Secrets (LiveKit, LLM provider keys) live in the SDK's process, never in xray's. xray's image never holds provider credentials.
- Secrets are runtime-only — pass them at run time (compose
environment:/env_file:, ordocker run -e), never baked into the image. - 7-day cooldown on npm releases, deny-by-default lifecycle scripts, every GitHub Action pinned to a 40-char SHA. See
.claude/rules/supply-chain.md. - Releases are signed with cosign keyless (OIDC) and carry build-provenance attestations.
docs/SDK.md— Python authoring + runtime + how to propagate baggage from LiveKit room metadata.docs/WIRE.md— OTLP attribute contract + recognized vocabularies and what fields are extracted from each./docson your running instance — generated OpenAPI 3.1 reference rendered by Scalar.
corepack enable # picks up the pinned pnpm
pnpm install # frozen-lockfile-safe; respects 7-day cooldown
pnpm dev # single Bun process via compose.dev.yaml (HMR for SPA + API)
pnpm docker:smoke # build image, run it, curl /healthz, kill — same check CI runsEvery CI step runs locally with one pnpm script. See CONTRIBUTING.md and CLAUDE.md.
Elastic License 2.0. Free to use, copy, modify, and self-host, including commercially inside your own organization. You may not offer xray to third parties as a hosted or managed service.