fix: use last streaming entry for output token count by GuyMoses · Pull Request #108 · dash0hq/dash0-agent-plugin

GuyMoses · 2026-06-03T14:24:07Z

Summary

The transcript requestId deduplication was taking the first streaming entry per API call, which carries a partial output_tokens count (1–7 tokens from the streaming start). The last entry has the final count.
Verified against a real transcript: 132 out of 847 requestIds had differing usage across streaming entries. The first entry consistently had output_tokens: 1 while the last had the real value (100–1280).
Impact: ~15% undercount of output tokens (224,818 reported vs 263,699 actual across one session). Input tokens and cache tokens were unaffected.
Fix: changed from skip-if-seen to last-write-wins — store the last entry per requestId in a map, then sum across all requestIds after reading the transcript.

Test plan

TestReadTurnUsageDeduplicatesRequestID updated to use different output_tokens values across streaming entries (1 vs 150) and assert the final value is used
All existing transcript tests pass (single assistant, multi-iteration aggregation, turn reset, tool_result handling, pretty-printed JSON, partial fields)
Full test suite passes (go test ./...)
Deploy and compare output token totals against native claude_code.token.usage counter

The transcript deduplication was taking the first entry per requestId, which carries a partial output_tokens count (1-7 tokens from the streaming start). The final entry has the real count. This caused a ~15% undercount of output tokens compared to native telemetry. Change from skip-if-seen to last-write-wins: store the last entry per requestId, then sum across all requestIds after reading the transcript.

GuyMoses · 2026-06-03T17:11:45Z

Plugin vs Native Token Gap — Investigation Summary

After fixing the streaming dedup bug in this PR, I investigated the remaining token gap between the plugin and native Claude Code telemetry.

Root cause

The transcript file does not record token usage for background API calls. Two types are missing:

Background call	Frequency	Model	Token impact	Cost impact
Title generation (`ai-title` entries)	~35/session	Haiku	~14K input, ~350 output	~$0.02 (negligible)
Context compaction (`away_summary` entries)	~10/session	Main model (Opus)	~1M input (mostly cache reads), ~3K output	~$0.50–1.50

Native tracing captures both as claude_code.llm_request spans with full token counts. The plugin only reads assistant transcript entries, which don't include these calls.

Magnitude

On a real 29-turn Opus session ($19.30 total):

Background gap: ~$0.50–1.50 (~3–8% of session cost)
Compaction dominates; title gen is <0.1%
Gap grows with session length (more compaction events)

SubagentStop events were silently dropped, losing 28-71% of session tokens. Route SubagentStop through sendLLMTrace, reading token usage from the sub-agent transcript file and emitting invoke_agent spans with gen_ai.agent.name, parented under the Agent tool call span.

mosheshaham-dash0 approved these changes Jun 4, 2026

View reviewed changes

GuyMoses merged commit 1e64ae2 into main Jun 4, 2026
4 checks passed

GuyMoses deleted the fix/output-token-undercount branch June 4, 2026 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use last streaming entry for output token count#108

fix: use last streaming entry for output token count#108
GuyMoses merged 2 commits into
mainfrom
fix/output-token-undercount

GuyMoses commented Jun 3, 2026

Uh oh!

GuyMoses commented Jun 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GuyMoses commented Jun 3, 2026

Summary

Test plan

Uh oh!

GuyMoses commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plugin vs Native Token Gap — Investigation Summary

Root cause

Magnitude

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GuyMoses commented Jun 3, 2026 •

edited

Loading