BraintrustMiddleware drops token / cost metrics for AI SDK v3 providers that return nested usage (e.g. @ai-sdk/openai 3.x Responses API)

## Summary

`BraintrustMiddleware` (model-level, from `wrapLanguageModel`) emits a span with empty `metrics` when the AI SDK provider returns a **nested** `usage` shape, which is the format `@ai-sdk/openai@3.x` uses on its Responses API path (gpt-4o, gpt-4.1, gpt-5, etc.).

Result: every span produced this way for OpenAI models has neither `prompt_tokens`, `completion_tokens`, nor `tokens` populated. Cost is therefore not computed in the dashboard. Anthropic spans capture fine because `@ai-sdk/anthropic` returns the flat shape.

## Versions

| Package | Version |
|---|---|
| `braintrust` | `3.9.0` |
| `ai` | `6.0.85` |
| `@ai-sdk/openai` | `3.0.52` |
| `@ai-sdk/anthropic` | `3.0.68` |

## Reproduction

```ts
import { wrapLanguageModel, generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { BraintrustMiddleware } from "braintrust";

const model = wrapLanguageModel({
  model: openai("gpt-4.1-mini"),
  middleware: BraintrustMiddleware({ spanInfo: { name: "demo" } }),
});

await generateText({ model, prompt: "Say hi" });
// → A "demo" span lands in Braintrust with metrics = { start, end } only.
//   No prompt_tokens / completion_tokens / tokens / cost.
```

Swap `openai("gpt-4.1-mini")` for `anthropic("claude-sonnet-4-6")` and metrics populate correctly.

## Expected

`prompt_tokens`, `completion_tokens`, `tokens` (and where applicable `prompt_cached_tokens`, `completion_reasoning_tokens`) should be populated for every provider whose AI SDK adapter reports usage, regardless of whether the adapter returns the flat or nested shape.

## Actual

For any `@ai-sdk/openai@3.x` call (chat-completions or Responses API path) the resulting span looks like:

```json
{
  "span_attributes": { "name": "demo", "type": "llm" },
  "metadata": {
    "model": "gpt-4.1-mini-2025-04-14",
    "provider": "openai",
    "finish_reason": { "unified": "stop" }
  },
  "metrics": { "start": 1777226818.85, "end": 1777226822.226 }
}
```

No tokens, no cost.

## Root cause

`@ai-sdk/openai@3.x` (Responses API path — used for ALL chat-completions models, including gpt-4o-mini, gpt-4.1-mini, gpt-5-nano, etc.) normalizes usage into a **nested** shape, see `node_modules/@ai-sdk/openai/dist/index.js:2602` `convertOpenAIResponsesUsage`:

```ts
return {
  inputTokens:  { total, noCache, cacheRead, cacheWrite },
  outputTokens: { total, text, reasoning },
  raw,
};
```

`BraintrustMiddleware`'s `wrapGenerate` extracts metrics via `normalizeUsageMetrics` (`braintrust/dist/index.js:21622`), which reads `usage.inputTokens` / `usage.outputTokens` as **numbers**:

```ts
function normalizeUsageMetrics(usage, provider, providerMetadata) {
  const metrics = {};
  const inputTokens = getNumberProperty2(usage, "inputTokens");   // <- gets {total,...}, returns undefined
  if (inputTokens !== void 0) metrics.prompt_tokens = inputTokens;
  // …same for outputTokens, totalTokens, reasoningTokens, cachedInputTokens
  return metrics;
}
```

`getNumberProperty2` returns `undefined` when the property is an object, so every metric is silently skipped.

Note that `extractTokenMetrics` in the same file (line 13670) — used by the higher-level `wrapAISDK` / `wrapGenerateText` path — **does** handle the nested shape via `_optionalChain([usage, 'access', _ => _.inputTokens, 'optionalAccess', _ => _.total])`. Only the model-level `normalizeUsageMetrics` is missing this case.

## Proposed fix

`normalizeUsageMetrics` should fall back to the nested-shape read when the flat read returns `undefined`. Patch:

```diff
 function normalizeUsageMetrics(usage, provider, providerMetadata) {
   const metrics = {};
-  const inputTokens = getNumberProperty2(usage, "inputTokens");
+  const inputTokensFlat = getNumberProperty2(usage, "inputTokens");
+  const inputTokens = inputTokensFlat !== undefined
+    ? inputTokensFlat
+    : getNumberProperty2(usage?.inputTokens, "total");
   if (inputTokens !== undefined) metrics.prompt_tokens = inputTokens;

-  const outputTokens = getNumberProperty2(usage, "outputTokens");
+  const outputTokensFlat = getNumberProperty2(usage, "outputTokens");
+  const outputTokens = outputTokensFlat !== undefined
+    ? outputTokensFlat
+    : getNumberProperty2(usage?.outputTokens, "total");
   if (outputTokens !== undefined) metrics.completion_tokens = outputTokens;

-  const totalTokens = getNumberProperty2(usage, "totalTokens");
+  const totalTokensFlat = getNumberProperty2(usage, "totalTokens");
+  const totalTokens = totalTokensFlat !== undefined
+    ? totalTokensFlat
+    : (typeof inputTokens === "number" && typeof outputTokens === "number"
+        ? inputTokens + outputTokens : undefined);
   if (totalTokens !== undefined) metrics.tokens = totalTokens;

-  const reasoningTokens = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningFlat = getNumberProperty2(usage, "reasoningTokens");
+  const reasoningTokens = reasoningFlat !== undefined
+    ? reasoningFlat
+    : getNumberProperty2(usage?.outputTokens, "reasoning");
   if (reasoningTokens !== undefined) metrics.completion_reasoning_tokens = reasoningTokens;

-  const cachedInputTokens = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputFlat = getNumberProperty2(usage, "cachedInputTokens");
+  const cachedInputTokens = cachedInputFlat !== undefined
+    ? cachedInputFlat
+    : getNumberProperty2(usage?.inputTokens, "cacheRead");
   if (cachedInputTokens !== undefined) metrics.prompt_cached_tokens = cachedInputTokens;

+  const cacheWriteTokens = getNumberProperty2(usage?.inputTokens, "cacheWrite");
+  if (cacheWriteTokens !== undefined) metrics.prompt_cache_creation_tokens = cacheWriteTokens;
+
   if (provider === "anthropic") { /* …unchanged… */ }
   return metrics;
 }
```

This is the same dual-shape strategy already used by `extractTokenMetrics` higher in the file, so the two extractors agree.

## Workaround (for anyone hitting this before a fix lands)

`bun patch braintrust` against `dist/index.js` and `dist/index.mjs` with the diff above. Confirmed working locally — OpenAI spans now show `prompt_tokens`, `completion_tokens`, `tokens`, and `prompt_cached_tokens`, and the dashboard computes cost.

## Scope

Affects every consumer of `BraintrustMiddleware` (model-level wrapping, the path the docs recommend for AI SDK integrations) when paired with `@ai-sdk/openai@3.x`. Likely also affects future providers that adopt the nested shape. Not specific to reasoning models — observed across `gpt-4o-mini`, `gpt-4.1-mini`, `gpt-5*-nano`.


Package	Version
`braintrust`	`3.9.0`
`ai`	`6.0.85`
`@ai-sdk/openai`	`3.0.52`
`@ai-sdk/anthropic`	`3.0.68`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BraintrustMiddleware drops token / cost metrics for AI SDK v3 providers that return nested usage (e.g. @ai-sdk/openai 3.x Responses API) #1909

Summary

Versions

Reproduction

Expected

Actual

Root cause

Proposed fix

Workaround (for anyone hitting this before a fix lands)

Scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

BraintrustMiddleware drops token / cost metrics for AI SDK v3 providers that return nested usage (e.g. @ai-sdk/openai 3.x Responses API) #1909

Description

Summary

Versions

Reproduction

Expected

Actual

Root cause

Proposed fix

Workaround (for anyone hitting this before a fix lands)

Scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions