design(tdd 0041): estimate-error tolerance (K) + authoring padding by cahenesy · Pull Request #116 · cahenesy/throughline

cahenesy · 2026-06-09T11:15:45Z

What

Folds two systemic robustness fixes into draft 0041 (no PRD change — "how" refinements within FR-65/66/67). Both come out of this session's finding that declared ## Expected diff size runs ~1.55× low systematically (clean non-survivorship sample of 7 implemented TDDs; 8 upward TDD revisions, 0 downward).

Component	Fix
4 (FIX 1)	K-tolerance (`THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE`, default 1.6) on the runtime `structural-finding(b)` per-file escalation (`actual > declared × K`). Deliberately reverses 0041's earlier "estimate accuracy is the discipline" stance, on the data. Design-time hard caps (`tdd-lint --bounds`: declared ≤300, ≤8 files) unchanged — there's no `actual` at design time.
5	Advisory, non-gating estimate-padding heuristic in `skills/tdd-author/SKILL.md` (test/eval ≈1.6×, shell-lib ≈1.4×) — reduces the error at authoring time so the runtime tolerance is headroom, not the only guard.

FIX 2 (reset rework budget on resume) was dropped as redundant: 0041's existing Component 1 (roll back scope-rejected attempts) + 0039's --recover already cover the budget deadlock — and 0041's own Alternatives already rejected "reset whole counter on resume." K=1.6 sits just above the measured ~1.55× mean, so a normally-biased estimate passes while a genuine ≥1.6× under-scope (e.g. the observed 1.87× implement-watch.sh) still escalates.

Gates

tdd-lint structural + --bounds: exit 0 (6 files, body 434 lines).
Independent design-reviewer (sonnet): DESIGN_REVIEW: PASS — verified Component 4 against the real _rework_pre_pass comparison, the FR-53/54/55-vs-FR-67(b) distinction (watertight), that Components 1+3+4+5 compose, and that FIX 2's omission is sound. It confirmed Component 3 is a no-op in current code. 3 minor + 1 nit applied inline.

Open assumptions & waivers

overlap with queued 0041/0039 — resolved: FIX 1 folded into 0041 (Component 4, stance reversed on the data); Component 5 adds authoring padding; FIX 2 dropped as redundant.
running build pinned to run-start 0041 — resolved: verified implement.sh:553 stacks each TDD build off the single run-start BASE (no per-TDD integration re-merge).

⚠️ Hold the merge — sequencing caveat

The in-flight /implement run is pinned to the run-start 0041, so it will build the old 0041 regardless of this PR. This revision reaches a build only if: (a) the run halts before 0041 → a future resume picks it up, or (b) old-0041 builds + merges → this becomes a supersede (new TDD). Recommend not merging this until we see what the running build does with 0041 — merging prematurely could collide with the build's status flip. I'll advise once the run resolves.

🤖 Generated with Claude Code

…g to bounded-rework convergence Folds two systemic robustness fixes into draft 0041 (no PRD change — "how" refinements within FR-65/66/67): - Component 4 (FIX 1): a K-tolerance (THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE, default 1.6) on the RUNTIME structural-finding(b) per-file escalation (actual > declared × K), DELIBERATELY REVERSING 0041's earlier "estimate accuracy is the discipline" stance. Justified by a history study: declared expected-diff runs ~1.55× low systematically (8 upward TDD revisions, 0 downward). Design-time hard caps (tdd-lint --bounds: declared ≤300, ≤8 files) unchanged — only the runtime threshold gains tolerance. - Component 5: an advisory, non-gating estimate-padding heuristic in skills/tdd-author/SKILL.md (test/eval ≈1.6×, shell-lib ≈1.4×) so estimates are closer to reality at authoring time (belt-and-suspenders to Component 4). FIX 2 (reset rework budget on resume) was DROPPED as redundant: 0041's existing Component 1 (roll back scope-rejected attempts) + 0039's --recover already cover the budget deadlock, and 0041's Alternatives already rejected "reset whole counter on resume." Gates: tdd-lint structural + --bounds exit 0 (6 files, body 434 lines). Independent design-reviewer: DESIGN_REVIEW: PASS (verified Component 4 against the real _rework_pre_pass code, the FR-53/54/55-vs-FR-67(b) distinction, and that Components 1+3+4+5 compose; confirmed Component 3 is a no-op in current code). Open assumptions & waivers: - overlap with queued 0041/0039 — resolved: FIX 1 folded into 0041 (Component 4, stance reversed on the data); Component 5 adds authoring padding; FIX 2 dropped as redundant with 0041(a) + 0039 --recover. - running build pinned to run-start 0041 — resolved: verified implement.sh:553 stacks each TDD build off the single run-start BASE, no per-TDD integration re-merge, so this revision will NOT reach the in-flight build; it reaches a build only via a future run or a supersede if old-0041 builds + merges first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cahenesy merged commit 7899e3a into master Jun 9, 2026
1 check passed

cahenesy deleted the docs/design/0041-estimate-tolerance branch June 9, 2026 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

design(tdd 0041): estimate-error tolerance (K) + authoring padding#116

design(tdd 0041): estimate-error tolerance (K) + authoring padding#116
cahenesy merged 1 commit into
masterfrom
docs/design/0041-estimate-tolerance

cahenesy commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cahenesy commented Jun 9, 2026

What

Gates

Open assumptions & waivers

⚠️ Hold the merge — sequencing caveat

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant