feat(example): add LiveKit voice-agent quickstart#71
Open
LukasPoque wants to merge 7 commits into
Open
Conversation
Collaborator
|
Ideally, this PR would include a database snapshot containing an authentic Conversation and Replay, to replace the current |
96707f2 to
d8d2123
Compare
ba66402 to
5bb6b4c
Compare
A self-contained `examples/livekit-voice-agent/` folder runs xray, LiveKit, and a minimal Gemini Live (v2v) voice agent in a four-service docker compose stack. A pytest "driver" container (`profiles: [test]`) drives one Replay end-to-end and asserts the SDK ↔ server wire works across all three OTLP vocabularies xray recognizes (xray.*, gen_ai.*, langfuse.*) plus the audio-ground-truth flow (POST .../analyze, SSE events, server-derived turns + speech_segments). Adds one `.dockerignore` exception (`!sdk/python/README.md`) so the agent + driver images can `pip install -e /workspace/sdk/python` editable — hatchling reads README.md from pyproject.toml at build time.
5bb6b4c to
3caba43
Compare
Cuts ~340 lines of redundant comments + drops the in-PR agent-span assertions (judge/assertion compute lives server-side; SDK-side assertion=lambda lands once the feature exists). Also drops `logging.basicConfig(...)` from agent/main.py — livekit-agents `cli.run_app` installs its own JsonFormatter handler, and adding a default StreamHandler from basicConfig caused every line to be emitted twice (text + JSON). Verified with full e2e run: agent log dropped from 318 lines to 20. Drops the redundant `GEMINI_API_KEY:` env entry in compose.yaml; only `GOOGLE_API_KEY` is needed (google-genai picks one when both are set and warns about the duplicate). Documents the expected 2 Langfuse exporter ERROR entries during startup: Langfuse v3 unconditionally installs an OTLP exporter against LANGFUSE_HOST when given keys, and the example uses fake keys + non-routable host so its export fails fast. xray's own exporter is unaffected; the test passes.
Drops `scripts/seed.ts` (595 lines of hand-crafted OTLP spans + sine audio) and `pnpm seed` in favor of a checked-in capture of one full run of `examples/livekit-voice-agent/` end-to-end. The inspector now renders authentic data — real livekit-agents + langfuse + Gemini Live emissions, server-derived VAD turns and speech segments, real stereo mixdown WAV — without anyone needing to run the example. Snapshot contents under `examples/livekit-voice-agent/snapshot/`: - `xray.db` (104 KB, WAL-checkpointed): 1 conversation, 1 completed replay, 8 spans across all three vocabularies (xray / gen_ai / langfuse), 3 server-derived turns, 6 speech_segments, 1 tool_call, 3 model_usage rows. - `audio/<replay-id>/replay.wav` (1.9 MB): the stereo mixdown. - `audio/recorded/<sha256>.wav` (220 KB): the recorded user-turn. Total 2.2 MB committed. `.gitattributes` marks `.db` + `.wav` binary so git stops trying to text-diff them. The example's `.gitignore` drops SQLite's transient `*.db-shm` / `*.db-wal` so opening the snapshot with a SQLite client doesn't dirty the working tree. To browse: `docker run --rm -p 8080:8080 -v $(pwd)/snapshot:/data ghcr.io/xray-eval/xray`. Regenerate instructions in the README. Wiring `pnpm dev` to bootstrap from this snapshot is a follow-up — the server needs a "copy fixture into empty /data on first boot" hook before that can be a single-command experience.
`examples/` should be a self-contained demo of "how to use xray" — Flutter-package convention. The committed snapshot is a repo-level fixture for the inspector, not part of the example's surface, so it moves to `snapshot/` at the repo root. README in the example drops to the essentials: file tree, quickstart, adapting-to-your-own-agent. The expected-log-noise prose and the fixture/snapshot/regeneration sections all leave — they belong (if anywhere) under top-level docs, not in the example README. `.gitignore`: the SQLite transient `*.db-shm` / `*.db-wal` ignore moves from the example's `.gitignore` to the root one, matching the new snapshot location.
Add surgical `express>qs` + `body-parser>qs` overrides so the transitive [email protected] pulled in via bunqueue>@modelcontextprotocol/sdk>express bumps to the patched 6.15.2. [email protected] was published 2026-05-16 — past the 7-day cooldown — so no minimumReleaseAgeExclude needed. Unblocks the supply-chain CI job. Co-Authored-By: Claude Opus 4.7 <[email protected]>
basilebong
reviewed
May 24, 2026
- Agent.instructions overrides RealtimeModel.instructions, so move
the prompt from RealtimeModel(instructions=...) to Agent(instructions=...)
and drop the blank kwarg that was silencing it.
- Await the SpeechHandle returned by session.generate_reply so
xray.stage.tts measures real TTS latency, not microseconds.
- Register the room "disconnected" listener before session.start and
wrap the body in try/finally to release disconnect on session-side
failures — otherwise xray.attach's force-flush never runs.
- compose.yaml: switch GEMINI_API_KEY from ${VAR:-} (silent default)
to ${VAR:?msg} so missing-key aborts compose with a clear message.
- README: add `cd examples/livekit-voice-agent`, note that
`compose up` streams logs (wait for worker registration), explain
why the driver lives behind --profile test, and fix the broken
`docs/integrate.md` link.
- Pin PyPI deps exactly in agent/driver pyproject.toml + Dockerfile;
document the PyPI cooldown gap as a new §6 in supply-chain.md.
Co-Authored-By: Claude Opus 4.7 <[email protected]>
Dependency ReviewThe following issues were found:
Vulnerabilitiesexamples/livekit-voice-agent/driver/pyproject.toml
Only included vulnerabilities with severity moderate or higher. License Issuesexamples/livekit-voice-agent/agent/pyproject.toml
OpenSSF Scorecard
Scanned Files
|
Collaborator
|
All seven review threads resolved in 7a83411. Quick rundown: Agent loop (
Compose / quickstart
Supply chain
Verified locally
Not verified
|
basilebong
previously approved these changes
May 24, 2026
Collaborator
basilebong
left a comment
There was a problem hiding this comment.
Self-review pass: all 7 inline findings addressed in 7a83411 (see summary comment). Approving so this can ship once you've done a smoke run with a real Gemini key.
`pytest < 9.0.3` is flagged by github/dependency-review-action — the 8.4.2 pin from the previous commit was vulnerable to the tmpdir symlink issue. 9.0.3 is the first patched release and is well past the 7-day PyPI cooldown (uploaded 2026-04-07). Co-Authored-By: Claude Opus 4.7 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
examples/livekit-voice-agent/— minimal LiveKit Agents worker driving Gemini Live (v2v), wired to xray viaxray.attach(ctx). Four docker compose services: livekit, xray, agent, and a profile-gated pytest driver..dockerignoreexception (!sdk/python/README.md) so the agent + driver images can pip-install the in-tree SDK editable. Verified the main production image still builds unchanged.What the example demonstrates
async with xray.attach(ctx, ...) as session).xray.stage.tts(xray vocab)example_langfuse_stepvia Langfuse@observe(langfuse vocab)execute_toolviasession.record_tool_call(...)(gen_ai vocab)POST /v1/replays/:id/analyze→ SSEevents→ server-derivedreplay_turns+speech_segments.conversation_item_added→rtc.Transcription, which is the missing link for getting Gemini Live's transcripts into the SDK'sAgentResponse.transcript.Repo rules followed
.claude/rules/code-layout.md:agent/,driver/,fixtures/sub-slices..claude/rules/supply-chain.md§4 (livekit/livekit-server,python:3.12-slim-bookworm).127.0.0.1:8080,127.0.0.1:7890,127.0.0.1:7891) so running this on shared wifi doesn't expose the inspector or LiveKit signaling..envignored at the example root + by root.gitignore:17..env.exampleshows the one required key.Test plan
docker compose --profile test run --rm driverpasses locally (~8–18 s per run).GET /v1/replays/:idshows 3 server-derived turns from VAD (regression test for the SDK fix on the parent branch).replay["spans"].tool_callsrow populated (get_current_year).model_usagerows extracted (langfuse stub + Gemini Live usage).xrayimage still builds after the.dockerignorechange.