Calibrate reasoning: add reasoningFrequency, update defaults by jimmytacks · Pull Request #124 · TagloGit/compact-sim

jimmytacks · 2026-04-03T10:09:56Z

Summary

Add reasoningFrequency parameter (0.0–1.0, default 0.47) — fraction of assistant turns that include reasoning output
Calibrate reasoningOutputSize default from 500 → 265 tokens (mean from 127 Models Agent conversations)
Uses Bresenham-style deterministic distribution for even spacing of reasoning across cycles
Frontend support added to ParameterPanel; sweep metadata added to PARAM_META

Engine Change

The sim previously overcharged reasoning output by ~3-4x (size overstatement × frequency overstatement). Since reasoning is billed at output price ($25/M), this inflates all absolute cost numbers. Rankings are unaffected (affects all strategies equally), but absolute costs were misleading.

Prior findings impact: All absolute cost numbers from prior experiments are overstated. Strategy rankings and relative comparisons remain valid since reasoning affects all strategies equally. FINDINGS.md will be updated after merge.

Test plan

All 197 tests pass
Lint passes
Production build succeeds
Conversation tests updated for new defaults + 3 new frequency-specific tests
Summary growth tests pinned to explicit reasoning settings to isolate concerns

Closes #94

🤖 Generated with Claude Code

…aults Calibrate reasoning output based on analysis of 127 Models Agent conversations: - reasoningOutputSize: 500 → 265 (calibrated mean) - New reasoningFrequency parameter (default 0.47): fraction of turns with reasoning output. Uses Bresenham-style distribution for deterministic, even spacing across cycles. Previously the sim charged reasoning on every turn at 500 tokens — a ~3-4x overstatement vs real data (only 47% of turns include thinking, at 265 avg). Closes #94 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jimmytacks merged commit 9f22d3d into main Apr 3, 2026
1 check passed

jimmytacks deleted the experiment/094-reasoning-calibration branch April 3, 2026 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calibrate reasoning: add reasoningFrequency, update defaults#124

Calibrate reasoning: add reasoningFrequency, update defaults#124
jimmytacks merged 1 commit into
mainfrom
experiment/094-reasoning-calibration

jimmytacks commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jimmytacks commented Apr 3, 2026

Summary

Engine Change

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant