Skip to content

build/20260608 195531/0038 mechanical per step test first enforcement#117

Merged
cahenesy merged 19 commits into
masterfrom
build/20260608-195531/0038-mechanical-per-step-test-first-enforcement
Jun 9, 2026
Merged

build/20260608 195531/0038 mechanical per step test first enforcement#117
cahenesy merged 19 commits into
masterfrom
build/20260608-195531/0038-mechanical-per-step-test-first-enforcement

Conversation

@cahenesy

@cahenesy cahenesy commented Jun 9, 2026

Copy link
Copy Markdown
Owner
  • test(failing): per-step mechanical test-first pre-check (TDD 0038 / FR-15a)
  • step(1): mechanical per-step test-first pre-check (TDD 0038 / FR-15a)
  • test(failing): build-prompt self-gate + aggregator wire-in rule (TDD 0038 §2/§3)
  • step(2): preventive self-gate + aggregator wire-in rule in build-prompt (TDD 0038 / FR-15a)
  • test(failing): four per-step-loop fixtures opt out of default-on gate (TDD 0038 §4)
  • step(3): reconcile four per-step-loop fixtures for default-on enforcement (TDD 0038 §4 / FR-15a)
  • test(failing): aggregator wire-in propagates the new eval's failure (TDD 0038 §3 dogfood)
  • step(4): wire test-first-per-step eval into the CI aggregator (TDD 0038 §3 / FR-15a)
  • docs(0038): note per-step test-first enforcement in README + bump plugin 3.20.0 -> 3.21.0
  • fix(0038): commit the streaming interval before the deterministic test-first BLOCK (TDD 0038 §1 / FR-15a)
  • rework: extract sentinel line before TEST_FIRST_SKIPPED: check to prevent prose match
  • rework: add §9 prose-mention test for the anchored skip-token (review:1 blocker)
  • rework: use tail -1 for _tf_sentinel to match step_id/sha extraction
  • rework: anchor _tf_sentinel grep to ^STEP_COMMIT: so prose lines after the sentinel cannot set tf_skip=1
  • test(failing): anchored skip-token residual — same-line + multi-sentinel bypass
  • fix: anchor skip-token to the last STEP_COMMIT sentinel match (gates.sh:979)
  • mark 0038-mechanical-per-step-test-first-enforcement implemented (verified + reviewed)

cahenesy and others added 19 commits June 8, 2026 20:05
…R-15a)

Drives _per_step_review_loop with impl-first / test(failing)-precursor /
TEST_FIRST_SKIPPED / extended-sentinel / knob-off cases. §1 (deterministic
test-first BLOCK with no model spawn) and §3 (skip telemetry) are RED — the
pre-check does not exist yet.
Component 1: factor _test_first_ok_range from test_first_ok (shared git-history
+ skip predicate, ADR 0006) and wire a deterministic per-step pre-check into
_per_step_review_loop's STEP_COMMIT branch. An impl-first step (no test(failing):
precursor in last-cleared..sha, no per-step TEST_FIRST_SKIPPED: token) gets a
fixed STEP_REVIEW: BLOCK with NO model review spawned; pass-through otherwise.
Honors THROUGHLINE_REQUIRE_TEST_FIRST. Rides the existing per-step BLOCK path
(ADR 0007; no new halt type). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…0038 §2/§3)

§6 greps build-prompt.md for the preventive self-gate bullet and the aggregator
wire-in rule; both absent (RED).
…pt (TDD 0038 / FR-15a)

Component 2: a self-verification bullet in the STEP_COMMIT handshake — the build
checks test-first ordering (and the optional STEP_COMMIT TEST_FIRST_SKIPPED:
token) before emitting the sentinel, turning reactive catch-and-revert into
prevention at the source; the §1 mechanical pre-check is the backstop.
Component 3: an aggregator wire-in rule under FAILING TEST FIRST — wiring a new
eval into tests/implement-gate.test.sh is new gating behavior requiring a
failing wire-in test, not TEST_FIRST: SKIPPED-eligible.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (TDD 0038 §4)

§7 greps each of continuous-in-build-review / build-defensive-norms /
step-commit-protocol / coproc-verdict-resilience for the
THROUGHLINE_REQUIRE_TEST_FIRST=0 export; absent (RED).
…ment (TDD 0038 §4 / FR-15a)

Component 4: each fixture drives _per_step_review_loop with impl-only step(N):
commits to exercise coproc/handshake/protocol/review mechanics, for which
test-first ordering is orthogonal. Under default-on per-step enforcement (§1)
those would hit the deterministic BLOCK before the path under test. Export
THROUGHLINE_REQUIRE_TEST_FIRST=0 once at file scope (with a WHY comment) so the
orthogonal gate is disabled; the dedicated eval is the sole knob-ON exerciser.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…TDD 0038 §3 dogfood)

§8 drives the real implement-gate.test.sh final AND-chain with the new eval
(TFP_FAIL) stubbed to fail and asserts the overall exit goes non-zero; also
asserts the eval is registered. Both RED — the eval is not yet wired in.
…38 §3 / FR-15a)

Register tests/test-first-per-step.test.sh as a sub-eval and add its TFP_FAIL
term to the final AND-chain, so the per-step pre-check, the build-prompt edits,
and the four-fixture non-regression are gated by ci-checks. Per the §3 wire-in
rule this is new gating behavior; the §8 dogfood (committed test(failing):)
drove the AND-chain term red→green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gin 3.20.0 -> 3.21.0

README gate-1 description now states failing-test-first is enforced per step
(deterministic STEP_REVIEW: BLOCK before any model review) as well as
whole-build, and lists the new test-first-per-step eval. Version bump for the
functional gates.sh + build-prompt.md change (TDD 0038 / FR-15a).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t-first BLOCK (TDD 0038 §1 / FR-15a)

Self-review finding: the deterministic per-step test-first BLOCK reset
interval_start without committing the streaming interval to build_active_seconds
(it mirrored the protocol-error correction path). But the protocol-error path is
bounded by a 2-attempt COUNT budget, whereas the test-first BLOCK has none, so a
build looping on impl-first re-emits would discard every interval and evade the
overall_active watchdog indefinitely. Align with the review-verdict path (a
deterministic BLOCK substitutes for the review): commit the interval first so
repeated BLOCKs accumulate active seconds and the watchdog bounds the loop. The
observable surface (BLOCK written, no model review spawned) is unchanged and
remains covered by tests/test-first-per-step.test.sh §1; this is an internal
active-time accounting fix (no flaky timing test added, per the eval's own
L-001/L-002 robustness guidance).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…31/0038-mechanical-per-step-test-first-enforcement
…:1 blocker)

Verifies a TEST_FIRST_SKIPPED: mention in prose on a NON-sentinel line of a
multi-line assistant event does NOT bypass the deterministic test-first BLOCK
(the anchored grep -m1 'STEP_COMMIT:' extraction reads the token off the
sentinel line only). Negative-control confirmed: fails against the unanchored
predecessor, passes against the fix. Eval now 23/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…31/0038-mechanical-per-step-test-first-enforcement
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nel bypass

§9 (strengthened from the vacuous different-line case) and new §10 drive the two
residual bypasses the post-rework grep '^STEP_COMMIT:' impl still has: a
TEST_FIRST_SKIPPED: token in trailing same-line prose after the sha, and a stale
prior STEP_COMMIT line's token leaking onto a later token-less sentinel (the
grep-all-lines vs step_id/sha tail -1 divergence the review flagged). Both must
still produce the deterministic test-first BLOCK. Fails against the current code
(4 red); the gates.sh fix follows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…sh:979)

Replace the divergent `grep '^STEP_COMMIT:'` (which returned ALL matching lines
and matched a TEST_FIRST_SKIPPED: token anywhere on them) with the TDD-specified
extraction: the same `grep -aoE 'STEP_COMMIT:…' | tail -1` the step_id/sha
parse uses, EXTENDED with an optional trailing `( …TEST_FIRST_SKIPPED:…)?` group.
The token is now honored only when it immediately follows the sha on the LAST
sentinel — consistent with step_id/sha (tail -1), closing the same-line-prose and
stale-prior-sentinel bypasses (review:1 majors). Covered by §9 (same-line) and
§10 (multi-sentinel) added test-first in the preceding commit; eval 25/0; the four
reconciled per-step-loop fixtures still pass; shellcheck clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cahenesy cahenesy merged commit f235e41 into master Jun 9, 2026
1 check passed
@cahenesy cahenesy deleted the build/20260608-195531/0038-mechanical-per-step-test-first-enforcement branch June 9, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant