Skip to content

BraintrustMiddleware drops token / cost metrics for AI SDK v3 providers that return nested usage (e.g. @ai-sdk/openai 3.x Responses API) #1909

@Ludusss

Description

@Ludusss

Summary

BraintrustMiddleware (model-level, from wrapLanguageModel) emits a span with empty metrics when the AI SDK provider returns a nested usage shape, which is the format @ai-sdk/[email protected] uses on its Responses API path (gpt-4o, gpt-4.1, gpt-5, etc.).

Result: every span produced this way for OpenAI models has neither prompt_tokens, completion_tokens, nor tokens populated. Cost is therefore not computed in the dashboard. Anthropic spans capture fine because @ai-sdk/anthropic returns the flat shape.

Versions

Package Version
braintrust 3.9.0
ai 6.0.85
@ai-sdk/openai 3.0.52
@ai-sdk/anthropic 3.0.68

Reproduction

import { wrapLanguageModel, generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { BraintrustMiddleware } from "braintrust";

const model = wrapLanguageModel({
  model: openai("gpt-4.1-mini"),
  middleware: BraintrustMiddleware({ spanInfo: { name: "demo" } }),
});

await generateText({ model, prompt: "Say hi" });
// → A "demo" span lands in Braintrust with metrics = { start, end } only.
//   No prompt_tokens / completion_tokens / tokens / cost.

Swap openai("gpt-4.1-mini") for anthropic("claude-sonnet-4-6") and metrics populate correctly.

Expected

prompt_tokens, completion_tokens, tokens (and where applicable prompt_cached_tokens, completion_reasoning_tokens) should be populated for every provider whose AI SDK adapter reports usage, regardless of whether the adapter returns the flat or nested shape.

Actual

For any @ai-sdk/[email protected] call (chat-completions or Responses API path) the resulting span looks like:

{
  "span_attributes": { "name": "demo", "type": "llm" },
  "metadata": {
    "model": "gpt-4.1-mini-2025-04-14",
    "provider": "openai",
    "finish_reason": { "unified": "stop" }
  },
  "metrics": { "start": 1777226818.85, "end": 1777226822.226 }
}

No tokens, no cost.

Root cause

@ai-sdk/[email protected] (Responses API path — used for ALL chat-completions models, including gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, etc.) normalizes usage into a nested shape, see node_modules/@ai-sdk/openai/dist/index.js:2602 convertOpenAIResponsesUsage:

return {
  inputTokens:  { total, noCache, cacheRead, cacheWrite },
  outputTokens: { total, text, reasoning },
  raw,
};

BraintrustMiddleware's wrapGenerate extracts metrics via normalizeUsageMetrics (braintrust/dist/index.js:21622), which reads usage.inputTokens / usage.outputTokens as numbers:

function normalizeUsageMetrics(usage, provider, providerMetadata) {
  const metrics = {};
  const inputTokens = getNumberProperty2(usage, "inputTokens");   // <- gets {total,...}, returns undefined
  if (inputTokens !== void 0) metrics.prompt_tokens = inputTokens;
  // …same for outputTokens, totalTokens, reasoningTokens, cachedInputTokens
  return metrics;
}

getNumberProperty2 returns undefined when the property is an object, so every metric is silently skipped.

Note that extractTokenMetrics in the same file (line 13670) — used by the higher-level wrapAISDK / wrapGenerateText path — does handle the nested shape via _optionalChain([usage, 'access', _ => _.inputTokens, 'optionalAccess', _ => _.total]). Only the model-level normalizeUsageMetrics is missing this case.

Proposed fix

normalizeUsageMetrics should fall back to the nested-shape read when the flat read returns undefined. Patch:

 function normalizeUsageMetrics(usage, provider, providerMetadata) {
   const metrics = {};
-  const inputTokens = getNumberProperty2(usage, "inputTokens");
+  const inputTokensFlat = getNumberProperty2(usage, "inputTokens");
+  const inputTokens = inputTokensFlat !== undefined
+    ? inputTokensFlat
+    : getNumberProperty2(usage?.inputTokens, "total");
   if (inputTokens !== undefined) metrics.prompt_tokens = inputTokens;

-  const outputTokens = getNumberProperty2(usage, "outputTokens");
+  const outputTokensFlat = getNumberProperty2(usage, "outputTokens");
+  const outputTokens = outputTokensFlat !== undefined
+    ? outputTokensFlat
+    : getNumberProperty2(usage?.outputTokens, "total");
   if (outputTokens !== undefined) metrics.completion_tokens = outputTokens;

-  const totalTokens = getNumberProperty2(usage, "totalTokens");
+  const totalTokensFlat = getNumberProperty2(usage, "totalTokens");
+  const totalTokens = totalTokensFlat !== undefined
+    ? totalTokensFlat
+    : (typeof inputTokens === "number" && typeof outputTokens === "number"
+        ? inputTokens + outputTokens : undefined);
   if (totalTokens !== undefined) metrics.tokens = totalTokens;

-  const reasoningTokens = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningFlat = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningTokens = reasoningFlat !== undefined
+    ? reasoningFlat
+    : getNumberProperty2(usage?.outputTokens, "reasoning");
   if (reasoningTokens !== undefined) metrics.completion_reasoning_tokens = reasoningTokens;

-  const cachedInputTokens = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputFlat = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputTokens = cachedInputFlat !== undefined
+    ? cachedInputFlat
+    : getNumberProperty2(usage?.inputTokens, "cacheRead");
   if (cachedInputTokens !== undefined) metrics.prompt_cached_tokens = cachedInputTokens;

+  const cacheWriteTokens = getNumberProperty2(usage?.inputTokens, "cacheWrite");
+  if (cacheWriteTokens !== undefined) metrics.prompt_cache_creation_tokens = cacheWriteTokens;
+
   if (provider === "anthropic") { /* …unchanged… */ }
   return metrics;
 }

This is the same dual-shape strategy already used by extractTokenMetrics higher in the file, so the two extractors agree.

Workaround (for anyone hitting this before a fix lands)

bun patch braintrust against dist/index.js and dist/index.mjs with the diff above. Confirmed working locally — OpenAI spans now show prompt_tokens, completion_tokens, tokens, and prompt_cached_tokens, and the dashboard computes cost.

Scope

Affects every consumer of BraintrustMiddleware (model-level wrapping, the path the docs recommend for AI SDK integrations) when paired with @ai-sdk/[email protected]. Likely also affects future providers that adopt the nested shape. Not specific to reasoning models — observed across gpt-4o-mini, gpt-4.1-mini, gpt-5*-nano.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions