Skip to content

design(tdd 0041): estimate-error tolerance (K) + authoring padding#116

Merged
cahenesy merged 1 commit into
masterfrom
docs/design/0041-estimate-tolerance
Jun 9, 2026
Merged

design(tdd 0041): estimate-error tolerance (K) + authoring padding#116
cahenesy merged 1 commit into
masterfrom
docs/design/0041-estimate-tolerance

Conversation

@cahenesy

@cahenesy cahenesy commented Jun 9, 2026

Copy link
Copy Markdown
Owner

What

Folds two systemic robustness fixes into draft 0041 (no PRD change — "how" refinements within FR-65/66/67). Both come out of this session's finding that declared ## Expected diff size runs ~1.55× low systematically (clean non-survivorship sample of 7 implemented TDDs; 8 upward TDD revisions, 0 downward).

Component Fix
4 (FIX 1) K-tolerance (THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE, default 1.6) on the runtime structural-finding(b) per-file escalation (actual > declared × K). Deliberately reverses 0041's earlier "estimate accuracy is the discipline" stance, on the data. Design-time hard caps (tdd-lint --bounds: declared ≤300, ≤8 files) unchanged — there's no actual at design time.
5 Advisory, non-gating estimate-padding heuristic in skills/tdd-author/SKILL.md (test/eval ≈1.6×, shell-lib ≈1.4×) — reduces the error at authoring time so the runtime tolerance is headroom, not the only guard.

FIX 2 (reset rework budget on resume) was dropped as redundant: 0041's existing Component 1 (roll back scope-rejected attempts) + 0039's --recover already cover the budget deadlock — and 0041's own Alternatives already rejected "reset whole counter on resume." K=1.6 sits just above the measured ~1.55× mean, so a normally-biased estimate passes while a genuine ≥1.6× under-scope (e.g. the observed 1.87× implement-watch.sh) still escalates.

Gates

  • tdd-lint structural + --bounds: exit 0 (6 files, body 434 lines).
  • Independent design-reviewer (sonnet): DESIGN_REVIEW: PASS — verified Component 4 against the real _rework_pre_pass comparison, the FR-53/54/55-vs-FR-67(b) distinction (watertight), that Components 1+3+4+5 compose, and that FIX 2's omission is sound. It confirmed Component 3 is a no-op in current code. 3 minor + 1 nit applied inline.

Open assumptions & waivers

  • overlap with queued 0041/0039 — resolved: FIX 1 folded into 0041 (Component 4, stance reversed on the data); Component 5 adds authoring padding; FIX 2 dropped as redundant.
  • running build pinned to run-start 0041 — resolved: verified implement.sh:553 stacks each TDD build off the single run-start BASE (no per-TDD integration re-merge).

⚠️ Hold the merge — sequencing caveat

The in-flight /implement run is pinned to the run-start 0041, so it will build the old 0041 regardless of this PR. This revision reaches a build only if: (a) the run halts before 0041 → a future resume picks it up, or (b) old-0041 builds + merges → this becomes a supersede (new TDD). Recommend not merging this until we see what the running build does with 0041 — merging prematurely could collide with the build's status flip. I'll advise once the run resolves.

🤖 Generated with Claude Code

…g to bounded-rework convergence

Folds two systemic robustness fixes into draft 0041 (no PRD change — "how"
refinements within FR-65/66/67):

- Component 4 (FIX 1): a K-tolerance (THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE,
  default 1.6) on the RUNTIME structural-finding(b) per-file escalation
  (actual > declared × K), DELIBERATELY REVERSING 0041's earlier "estimate
  accuracy is the discipline" stance. Justified by a history study: declared
  expected-diff runs ~1.55× low systematically (8 upward TDD revisions, 0
  downward). Design-time hard caps (tdd-lint --bounds: declared ≤300, ≤8 files)
  unchanged — only the runtime threshold gains tolerance.
- Component 5: an advisory, non-gating estimate-padding heuristic in
  skills/tdd-author/SKILL.md (test/eval ≈1.6×, shell-lib ≈1.4×) so estimates are
  closer to reality at authoring time (belt-and-suspenders to Component 4).

FIX 2 (reset rework budget on resume) was DROPPED as redundant: 0041's existing
Component 1 (roll back scope-rejected attempts) + 0039's --recover already cover
the budget deadlock, and 0041's Alternatives already rejected "reset whole
counter on resume."

Gates: tdd-lint structural + --bounds exit 0 (6 files, body 434 lines).
Independent design-reviewer: DESIGN_REVIEW: PASS (verified Component 4 against the
real _rework_pre_pass code, the FR-53/54/55-vs-FR-67(b) distinction, and that
Components 1+3+4+5 compose; confirmed Component 3 is a no-op in current code).

Open assumptions & waivers:
- overlap with queued 0041/0039 — resolved: FIX 1 folded into 0041 (Component 4,
  stance reversed on the data); Component 5 adds authoring padding; FIX 2 dropped
  as redundant with 0041(a) + 0039 --recover.
- running build pinned to run-start 0041 — resolved: verified implement.sh:553
  stacks each TDD build off the single run-start BASE, no per-TDD integration
  re-merge, so this revision will NOT reach the in-flight build; it reaches a
  build only via a future run or a supersede if old-0041 builds + merges first.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cahenesy cahenesy merged commit 7899e3a into master Jun 9, 2026
1 check passed
@cahenesy cahenesy deleted the docs/design/0041-estimate-tolerance branch June 9, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant