design(tdd 0041): estimate-error tolerance (K) + authoring padding#116
Merged
Conversation
…g to bounded-rework convergence Folds two systemic robustness fixes into draft 0041 (no PRD change — "how" refinements within FR-65/66/67): - Component 4 (FIX 1): a K-tolerance (THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE, default 1.6) on the RUNTIME structural-finding(b) per-file escalation (actual > declared × K), DELIBERATELY REVERSING 0041's earlier "estimate accuracy is the discipline" stance. Justified by a history study: declared expected-diff runs ~1.55× low systematically (8 upward TDD revisions, 0 downward). Design-time hard caps (tdd-lint --bounds: declared ≤300, ≤8 files) unchanged — only the runtime threshold gains tolerance. - Component 5: an advisory, non-gating estimate-padding heuristic in skills/tdd-author/SKILL.md (test/eval ≈1.6×, shell-lib ≈1.4×) so estimates are closer to reality at authoring time (belt-and-suspenders to Component 4). FIX 2 (reset rework budget on resume) was DROPPED as redundant: 0041's existing Component 1 (roll back scope-rejected attempts) + 0039's --recover already cover the budget deadlock, and 0041's Alternatives already rejected "reset whole counter on resume." Gates: tdd-lint structural + --bounds exit 0 (6 files, body 434 lines). Independent design-reviewer: DESIGN_REVIEW: PASS (verified Component 4 against the real _rework_pre_pass code, the FR-53/54/55-vs-FR-67(b) distinction, and that Components 1+3+4+5 compose; confirmed Component 3 is a no-op in current code). Open assumptions & waivers: - overlap with queued 0041/0039 — resolved: FIX 1 folded into 0041 (Component 4, stance reversed on the data); Component 5 adds authoring padding; FIX 2 dropped as redundant with 0041(a) + 0039 --recover. - running build pinned to run-start 0041 — resolved: verified implement.sh:553 stacks each TDD build off the single run-start BASE, no per-TDD integration re-merge, so this revision will NOT reach the in-flight build; it reaches a build only via a future run or a supersede if old-0041 builds + merges first. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Folds two systemic robustness fixes into draft 0041 (no PRD change — "how" refinements within FR-65/66/67). Both come out of this session's finding that declared
## Expected diff sizeruns ~1.55× low systematically (clean non-survivorship sample of 7 implemented TDDs; 8 upward TDD revisions, 0 downward).THROUGHLINE_STRUCTURAL_DIFF_TOLERANCE, default 1.6) on the runtimestructural-finding(b)per-file escalation (actual > declared × K). Deliberately reverses 0041's earlier "estimate accuracy is the discipline" stance, on the data. Design-time hard caps (tdd-lint --bounds: declared ≤300, ≤8 files) unchanged — there's noactualat design time.skills/tdd-author/SKILL.md(test/eval ≈1.6×, shell-lib ≈1.4×) — reduces the error at authoring time so the runtime tolerance is headroom, not the only guard.FIX 2 (reset rework budget on resume) was dropped as redundant: 0041's existing Component 1 (roll back scope-rejected attempts) + 0039's
--recoveralready cover the budget deadlock — and 0041's own Alternatives already rejected "reset whole counter on resume." K=1.6 sits just above the measured ~1.55× mean, so a normally-biased estimate passes while a genuine ≥1.6× under-scope (e.g. the observed 1.87×implement-watch.sh) still escalates.Gates
tdd-lintstructural +--bounds: exit 0 (6 files, body 434 lines).DESIGN_REVIEW: PASS— verified Component 4 against the real_rework_pre_passcomparison, the FR-53/54/55-vs-FR-67(b) distinction (watertight), that Components 1+3+4+5 compose, and that FIX 2's omission is sound. It confirmed Component 3 is a no-op in current code. 3 minor + 1 nit applied inline.Open assumptions & waivers
implement.sh:553stacks each TDD build off the single run-startBASE(no per-TDD integration re-merge).The in-flight
/implementrun is pinned to the run-start 0041, so it will build the old 0041 regardless of this PR. This revision reaches a build only if: (a) the run halts before 0041 → a future resume picks it up, or (b) old-0041 builds + merges → this becomes a supersede (new TDD). Recommend not merging this until we see what the running build does with 0041 — merging prematurely could collide with the build's status flip. I'll advise once the run resolves.🤖 Generated with Claude Code