Skip to content

feat(example): add LiveKit voice-agent quickstart#71

Open
LukasPoque wants to merge 7 commits into
mainfrom
example/livekit-voice-agent
Open

feat(example): add LiveKit voice-agent quickstart#71
LukasPoque wants to merge 7 commits into
mainfrom
example/livekit-voice-agent

Conversation

@LukasPoque
Copy link
Copy Markdown
Member

Summary

  • New examples/livekit-voice-agent/ — minimal LiveKit Agents worker driving Gemini Live (v2v), wired to xray via xray.attach(ctx). Four docker compose services: livekit, xray, agent, and a profile-gated pytest driver.
  • One .dockerignore exception (!sdk/python/README.md) so the agent + driver images can pip-install the in-tree SDK editable. Verified the main production image still builds unchanged.

What the example demonstrates

  • The one-line xray integration (async with xray.attach(ctx, ...) as session).
  • One user-emitted span per recognized vocabulary lands on the replay:
    • xray.stage.tts (xray vocab)
    • example_langfuse_step via Langfuse @observe (langfuse vocab)
    • execute_tool via session.record_tool_call(...) (gen_ai vocab)
  • The audio-ground-truth flow end-to-end: stereo WAV upload → POST /v1/replays/:id/analyze → SSE events → server-derived replay_turns + speech_segments.
  • A transcript-republish bridge from conversation_item_addedrtc.Transcription, which is the missing link for getting Gemini Live's transcripts into the SDK's AgentResponse.transcript.

Repo rules followed

  • Slice layout per .claude/rules/code-layout.md: agent/, driver/, fixtures/ sub-slices.
  • Base images SHA-pinned per .claude/rules/supply-chain.md §4 (livekit/livekit-server, python:3.12-slim-bookworm).
  • Host ports bound loopback-only (127.0.0.1:8080, 127.0.0.1:7890, 127.0.0.1:7891) so running this on shared wifi doesn't expose the inspector or LiveKit signaling.
  • No secrets committed; .env ignored at the example root + by root .gitignore:17. .env.example shows the one required key.

Test plan

  • docker compose --profile test run --rm driver passes locally (~8–18 s per run).
  • GET /v1/replays/:id shows 3 server-derived turns from VAD (regression test for the SDK fix on the parent branch).
  • All 3 vocabulary spans appear in replay["spans"].
  • tool_calls row populated (get_current_year).
  • model_usage rows extracted (langfuse stub + Gemini Live usage).
  • Main xray image still builds after the .dockerignore change.
  • SDK pytest still 12/12 green.

@LukasPoque LukasPoque requested review from basilebong and removed request for basilebong May 21, 2026 17:48
@LukasPoque LukasPoque self-assigned this May 21, 2026
@basilebong
Copy link
Copy Markdown
Collaborator

Ideally, this PR would include a database snapshot containing an authentic Conversation and Replay, to replace the current pnpm seed script. What do you think?

@basilebong basilebong force-pushed the feat/content-hash-conversations branch from 96707f2 to d8d2123 Compare May 22, 2026 06:33
Base automatically changed from feat/content-hash-conversations to main May 22, 2026 16:18
@LukasPoque LukasPoque force-pushed the example/livekit-voice-agent branch 2 times, most recently from ba66402 to 5bb6b4c Compare May 22, 2026 16:48
@LukasPoque LukasPoque changed the base branch from main to feat/client-content-hash-migration May 22, 2026 16:48
Base automatically changed from feat/client-content-hash-migration to main May 22, 2026 16:48
A self-contained `examples/livekit-voice-agent/` folder runs xray,
LiveKit, and a minimal Gemini Live (v2v) voice agent in a four-service
docker compose stack. A pytest "driver" container (`profiles: [test]`)
drives one Replay end-to-end and asserts the SDK ↔ server wire works
across all three OTLP vocabularies xray recognizes (xray.*, gen_ai.*,
langfuse.*) plus the audio-ground-truth flow (POST .../analyze, SSE
events, server-derived turns + speech_segments).

Adds one `.dockerignore` exception (`!sdk/python/README.md`) so the
agent + driver images can `pip install -e /workspace/sdk/python`
editable — hatchling reads README.md from pyproject.toml at build
time.
@LukasPoque LukasPoque force-pushed the example/livekit-voice-agent branch from 5bb6b4c to 3caba43 Compare May 22, 2026 17:02
Cuts ~340 lines of redundant comments + drops the in-PR agent-span
assertions (judge/assertion compute lives server-side; SDK-side
assertion=lambda lands once the feature exists).

Also drops `logging.basicConfig(...)` from agent/main.py — livekit-agents
`cli.run_app` installs its own JsonFormatter handler, and adding a
default StreamHandler from basicConfig caused every line to be emitted
twice (text + JSON). Verified with full e2e run: agent log dropped from
318 lines to 20.

Drops the redundant `GEMINI_API_KEY:` env entry in compose.yaml; only
`GOOGLE_API_KEY` is needed (google-genai picks one when both are set
and warns about the duplicate).

Documents the expected 2 Langfuse exporter ERROR entries during startup:
Langfuse v3 unconditionally installs an OTLP exporter against
LANGFUSE_HOST when given keys, and the example uses fake keys +
non-routable host so its export fails fast. xray's own exporter is
unaffected; the test passes.
Drops `scripts/seed.ts` (595 lines of hand-crafted OTLP spans + sine
audio) and `pnpm seed` in favor of a checked-in capture of one full
run of `examples/livekit-voice-agent/` end-to-end. The inspector now
renders authentic data — real livekit-agents + langfuse + Gemini Live
emissions, server-derived VAD turns and speech segments, real stereo
mixdown WAV — without anyone needing to run the example.

Snapshot contents under `examples/livekit-voice-agent/snapshot/`:
- `xray.db` (104 KB, WAL-checkpointed): 1 conversation, 1 completed
  replay, 8 spans across all three vocabularies (xray / gen_ai /
  langfuse), 3 server-derived turns, 6 speech_segments, 1 tool_call,
  3 model_usage rows.
- `audio/<replay-id>/replay.wav` (1.9 MB): the stereo mixdown.
- `audio/recorded/<sha256>.wav` (220 KB): the recorded user-turn.

Total 2.2 MB committed. `.gitattributes` marks `.db` + `.wav` binary
so git stops trying to text-diff them. The example's `.gitignore`
drops SQLite's transient `*.db-shm` / `*.db-wal` so opening the
snapshot with a SQLite client doesn't dirty the working tree.

To browse: `docker run --rm -p 8080:8080 -v $(pwd)/snapshot:/data
ghcr.io/xray-eval/xray`. Regenerate instructions in the README.

Wiring `pnpm dev` to bootstrap from this snapshot is a follow-up —
the server needs a "copy fixture into empty /data on first boot"
hook before that can be a single-command experience.
`examples/` should be a self-contained demo of "how to use xray" —
Flutter-package convention. The committed snapshot is a repo-level
fixture for the inspector, not part of the example's surface, so it
moves to `snapshot/` at the repo root.

README in the example drops to the essentials: file tree, quickstart,
adapting-to-your-own-agent. The expected-log-noise prose and the
fixture/snapshot/regeneration sections all leave — they belong (if
anywhere) under top-level docs, not in the example README.

`.gitignore`: the SQLite transient `*.db-shm` / `*.db-wal` ignore
moves from the example's `.gitignore` to the root one, matching the
new snapshot location.
@LukasPoque LukasPoque requested a review from basilebong May 22, 2026 21:41
@LukasPoque LukasPoque assigned basilebong and unassigned LukasPoque May 22, 2026
Add surgical `express>qs` + `body-parser>qs` overrides so the transitive
[email protected] pulled in via bunqueue>@modelcontextprotocol/sdk>express bumps
to the patched 6.15.2. [email protected] was published 2026-05-16 — past the
7-day cooldown — so no minimumReleaseAgeExclude needed.

Unblocks the supply-chain CI job.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Comment thread examples/livekit-voice-agent/compose.yaml Outdated
Comment thread examples/livekit-voice-agent/agent/main.py Outdated
Comment thread examples/livekit-voice-agent/agent/main.py Outdated
Comment thread examples/livekit-voice-agent/agent/main.py Outdated
Comment thread examples/livekit-voice-agent/README.md Outdated
Comment thread examples/livekit-voice-agent/README.md
Comment thread examples/livekit-voice-agent/agent/Dockerfile Outdated
- Agent.instructions overrides RealtimeModel.instructions, so move
  the prompt from RealtimeModel(instructions=...) to Agent(instructions=...)
  and drop the blank kwarg that was silencing it.
- Await the SpeechHandle returned by session.generate_reply so
  xray.stage.tts measures real TTS latency, not microseconds.
- Register the room "disconnected" listener before session.start and
  wrap the body in try/finally to release disconnect on session-side
  failures — otherwise xray.attach's force-flush never runs.
- compose.yaml: switch GEMINI_API_KEY from ${VAR:-} (silent default)
  to ${VAR:?msg} so missing-key aborts compose with a clear message.
- README: add `cd examples/livekit-voice-agent`, note that
  `compose up` streams logs (wait for worker registration), explain
  why the driver lives behind --profile test, and fix the broken
  `docs/integrate.md` link.
- Pin PyPI deps exactly in agent/driver pyproject.toml + Dockerfile;
  document the PyPI cooldown gap as a new §6 in supply-chain.md.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@github-actions
Copy link
Copy Markdown

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

The following issues were found:
  • ❌ 1 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 2 package(s) with unknown licenses.
See the Details below.

Vulnerabilities

examples/livekit-voice-agent/driver/pyproject.toml

NameVersionVulnerabilitySeverity
pytest8.4.2pytest has vulnerable tmpdir handlingmoderate
Only included vulnerabilities with severity moderate or higher.

License Issues

examples/livekit-voice-agent/agent/pyproject.toml

PackageVersionLicenseIssue Type
livekit-agents1.5.9NullUnknown License
livekit-plugins-google1.5.9NullUnknown License
Denied Licenses: GPL-1.0, GPL-1.0-only, GPL-1.0-or-later, GPL-2.0, GPL-2.0-only, GPL-2.0-or-later, GPL-3.0, GPL-3.0-only, GPL-3.0-or-later, LGPL-2.0, LGPL-2.0-only, LGPL-2.0-or-later, LGPL-2.1, LGPL-2.1-only, LGPL-2.1-or-later, LGPL-3.0, LGPL-3.0-only, LGPL-3.0-or-later, AGPL-1.0, AGPL-1.0-only, AGPL-1.0-or-later, AGPL-3.0, AGPL-3.0-only, AGPL-3.0-or-later, SSPL-1.0, MPL-1.0, MPL-1.1, MPL-2.0, EPL-1.0, EPL-2.0, CDDL-1.0, CDDL-1.1, EUPL-1.0, EUPL-1.1, EUPL-1.2, CC-BY-SA-1.0, CC-BY-SA-2.0, CC-BY-SA-2.5, CC-BY-SA-3.0, CC-BY-SA-4.0

OpenSSF Scorecard

PackageVersionScoreDetails
pip/langfuse 3.9.3 UnknownUnknown
pip/livekit-agents 1.5.9 UnknownUnknown
pip/livekit-plugins-google 1.5.9 UnknownUnknown
pip/pytest 8.4.2 UnknownUnknown
pip/httpx 0.28.1 UnknownUnknown
pip/pytest-asyncio 1.3.0 UnknownUnknown
npm/qs 6.15.2 🟢 5.3
Details
CheckScoreReason
Code-Review⚠️ 1Found 5/30 approved changesets -- score normalized to 1
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 10security policy file detected
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Maintained🟢 1013 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 10
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices🟢 5badge detected: Passing
Fuzzing⚠️ 0project is not fuzzed
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0

Scanned Files

  • examples/livekit-voice-agent/agent/pyproject.toml
  • examples/livekit-voice-agent/driver/pyproject.toml
  • pnpm-lock.yaml

@basilebong
Copy link
Copy Markdown
Collaborator

All seven review threads resolved in 7a83411. Quick rundown:

Agent loop (agent/main.py)

  • Agent(instructions="") → moved the assistant prompt onto Agent(...) and dropped the instructions= kwarg from RealtimeModel(...) (it would have been overridden anyway).
  • xray.stage.tts zero-duration span → captured the SpeechHandle from session.generate_reply(...) and awaited it inside the span, so the span now measures real TTS latency instead of microseconds.
  • Session leak on non-room aborts → registered the disconnected listener before session.start(...) and wrapped the body in try/finally so disconnect.set() always fires. xray.attach's force-flush now runs on the partial-run path too.

Compose / quickstart

  • Empty GEMINI_API_KEY → switched ${VAR:-} to ${VAR:?GEMINI_API_KEY must be set in examples/livekit-voice-agent/.env}. docker compose up now aborts immediately with a clear message instead of silently booting an agent that fails 30s later.
  • README quickstart → added the missing cd examples/livekit-voice-agent step, a note that compose up streams logs (wait for worker registration before running the driver), and a one-liner explaining why the driver lives behind --profile test.
  • Broken docs/integrate.md link → real relative markdown link now.

Supply chain

  • Pinned livekit-agents, livekit-plugins-google, langfuse, pytest, pytest-asyncio, httpx to exact versions in both pyproject.toml files and both Dockerfiles. Each pin was the latest PyPI release >7 days old at audit time (2026-05-24).
  • Added §6 to .claude/rules/supply-chain.md documenting that PyPI carries the same Shai-Hulud-class threat as npm but has no registry-side cooldown, and that until a hash-pinned requirements.txt is wired into CI, every Python dep here MUST be exact-pinned + manually cooldown-checked.

Verified locally

  • python -m ast parses both agent/main.py and driver/test_e2e.py.
  • docker compose config aborts cleanly without GEMINI_API_KEY and substitutes correctly with it set.
  • Bun server tests: 247/247 pass (unchanged — I didn't touch server code).

Not verified

  • Couldn't docker compose up with a real Gemini key from here; worth one manual smoke run before merging to confirm the greeting still fires and the TTS span shows a non-zero duration.

@basilebong basilebong assigned LukasPoque and unassigned basilebong May 24, 2026
basilebong
basilebong previously approved these changes May 24, 2026
Copy link
Copy Markdown
Collaborator

@basilebong basilebong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review pass: all 7 inline findings addressed in 7a83411 (see summary comment). Approving so this can ship once you've done a smoke run with a real Gemini key.

`pytest < 9.0.3` is flagged by github/dependency-review-action — the
8.4.2 pin from the previous commit was vulnerable to the tmpdir symlink
issue. 9.0.3 is the first patched release and is well past the 7-day
PyPI cooldown (uploaded 2026-04-07).

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants