fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39)#40
Conversation
check_phase_graceful_completion now checks only subphase-specific
artifacts for phases 1a and 1b, instead of requiring the full Phase 1
artifact set (which includes sandbox-plan.md from Phase 1c).
Previously, when Phase 1b produced all its artifacts but the model cut
off mid-turn, the graceful-forgiveness path in phase_1.py would call
check_phase_graceful_completion('1b'), which required all Phase 1
artifacts including sandbox-plan.md (a 1c artifact). Since that file
didn't exist yet, forgiveness was denied, returncode stayed 2, and the
auto-repair/validation blocks never ran. The user would see 'Phase 1b
did not complete cleanly (exit code 2)' even though all artifacts were
correctly written to disk.
Now Phase 1a only checks 1a artifacts, Phase 1b only checks 1b
artifacts, and Phase 1c/bare '1' keep the existing monolith logic.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (6)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughAdds subphase-specific graceful-completion checks for Phase 1 (1a, 1b, 1c) that evaluate artifact freshness per-subphase, moves Phase 1b required-note list usage into the completion module, updates two forgiveness text messages, and adds tests for subphase checks, mid-turn forgiveness, and CLI artifact checks using isolated tmp_path directories. ChangesPhase 1 Subphase Graceful Completion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Coverage Report
Generated by pytest-cov on |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tools/phases/completion.py (1)
125-135: ⚡ Quick winExtract Phase 1a artifact names to a module-level constant.
The Phase 1a artifact list
("target-profile.md", "build-model.md", "codeql-plan.yml")appears twice: once in the Phase 1a check (line 127) and again in_SKIP_1B(line 132). This duplication means both locations must be updated if the Phase 1a artifact set changes.♻️ Proposed refactor to eliminate duplication
_PHASE1_REQUIRED_ARTIFACT_NAMES = [ "target-profile.md", "attack-surface.md", "build-model.md", "execution-model.md", "trust-boundaries.md", "data-flow.md", "validation-model.md", "interesting-files.md", "file-risk-index.yml", "security-assumptions.md", "threat-model.md", "sandbox-plan.md", ] +_PHASE_1A_ARTIFACT_NAMES = ("target-profile.md", "build-model.md", "codeql-plan.yml") + def _phase1_required_artifacts() -> list[Path]: return [NOTES_ROOT / name for name in _PHASE1_REQUIRED_ARTIFACT_NAMES]Then update both usages:
if original_phase == "1a": notes_dir = ROOT / "itemdb" / "notes" - paths_1a = [notes_dir / n for n in ("target-profile.md", "build-model.md", "codeql-plan.yml")] + paths_1a = [notes_dir / n for n in _PHASE_1A_ARTIFACT_NAMES] return any(_path_is_fresh(p, run_start_time) for p in paths_1a) if original_phase == "1b": notes_dir = ROOT / "itemdb" / "notes" - _SKIP_1B = frozenset({"target-profile.md", "build-model.md", "codeql-plan.yml", "sandbox-plan.md"}) + _SKIP_1B = frozenset(set(_PHASE_1A_ARTIFACT_NAMES) | {"sandbox-plan.md"}) names_1b = [n for n in _PHASE1_REQUIRED_ARTIFACT_NAMES if n not in _SKIP_1B]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/phases/completion.py` around lines 125 - 135, The duplicated Phase 1a artifact names should be pulled into a module-level constant (e.g., PHASE1A_ARTIFACT_NAMES) and both usages updated to reference it: replace the inline tuple in the original_phase == "1a" branch (which builds paths_1a) with PHASE1A_ARTIFACT_NAMES, and replace the literal set in _SKIP_1B with frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and names_1b derive from the single constant; keep the rest of the logic (calls to _path_is_fresh, names_1b construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES) unchanged.tests/test_phase_graceful_completion_subphases.py (1)
82-82: 💤 Low valueClarify the comment to accurately describe what artifacts are created.
The comment states "Create only Phase 1a artifacts" but the code creates both
target-profile.md(a Phase 1a artifact) andsandbox-plan.md(a Phase 1c artifact). While the test logic is correct (both are excluded from the Phase 1b check), the comment could be more precise.📝 Proposed comment clarification
- # Create only Phase 1a artifacts — which are excluded from 1b check + # Create only non-1b artifacts (1a + 1c) — which are excluded from 1b check now = time.time() (notes / "target-profile.md").write_text("", encoding="utf-8")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_phase_graceful_completion_subphases.py` at line 82, Update the misleading comment on the test that currently reads "Create only Phase 1a artifacts" to reflect that the test creates both a Phase 1a artifact and a Phase 1c artifact; mention the specific files created (`target-profile.md` for Phase 1a and `sandbox-plan.md` for Phase 1c) so it’s clear both are intentionally excluded from the Phase 1b check in this test (look for the comment near the artifact-creation code in tests/test_phase_graceful_completion_subphases.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/test_phase_graceful_completion_subphases.py`:
- Line 82: Update the misleading comment on the test that currently reads
"Create only Phase 1a artifacts" to reflect that the test creates both a Phase
1a artifact and a Phase 1c artifact; mention the specific files created
(`target-profile.md` for Phase 1a and `sandbox-plan.md` for Phase 1c) so it’s
clear both are intentionally excluded from the Phase 1b check in this test (look
for the comment near the artifact-creation code in
tests/test_phase_graceful_completion_subphases.py).
In `@tools/phases/completion.py`:
- Around line 125-135: The duplicated Phase 1a artifact names should be pulled
into a module-level constant (e.g., PHASE1A_ARTIFACT_NAMES) and both usages
updated to reference it: replace the inline tuple in the original_phase == "1a"
branch (which builds paths_1a) with PHASE1A_ARTIFACT_NAMES, and replace the
literal set in _SKIP_1B with frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and
names_1b derive from the single constant; keep the rest of the logic (calls to
_path_is_fresh, names_1b construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES)
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4d096973-4a41-4c5f-bf1e-868d1c4a3b6d
📒 Files selected for processing (3)
tests/test_phase_artifacts_cli.pytests/test_phase_graceful_completion_subphases.pytools/phases/completion.py
There was a problem hiding this comment.
Pull request overview
This PR adjusts Phase 1 “graceful forgiveness” behavior so that mid-turn model cutoffs during subphases (especially 1b) can still proceed into the existing validation/auto-repair flow by checking subphase-specific artifacts rather than requiring the full Phase 1 monolith artifact set.
Changes:
- Added Phase 1a/1b subphase-specific logic to
check_phase_graceful_completion()so 1b no longer depends on Phase 1c artifacts likesandbox-plan.md. - Added a new test suite covering 1a/1b/1c graceful-completion behavior.
- Updated CLI artifact-check tests to run against an isolated filesystem by patching
ROOTtotmp_path.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/phases/completion.py | Implements subphase-aware graceful completion checks for Phase 1a/1b while keeping 1c/bare “1” on the existing monolith logic. |
| tests/test_phase_graceful_completion_subphases.py | Adds targeted tests for the new 1a/1b behavior and confirms 1c continues to use monolith requirements. |
| tests/test_phase_artifacts_cli.py | Improves test isolation by creating minimal directory structure in tmp_path and patching phases.artifact_checks.ROOT. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pruiz
left a comment
There was a problem hiding this comment.
Reviewed the PR in read-only mode. The direction looks good and I think the core 1b fix is on the right track, but I would address a few things before merging:
-
Reuse the Phase 1b artifact source of truth
completion.pycurrently derives the 1b artifact set by filtering_PHASE1_REQUIRED_ARTIFACT_NAMESand excluding 1a +sandbox-plan.md. Howeverartifact_checks.pyalready hasPHASE_1B_REQUIRED_NOTES, which looks like the canonical 1b artifact list used by the validator.I would prefer importing/reusing that list for the graceful-completion check, so the forgiveness logic and artifact validation cannot drift apart later.
-
Extract the Phase 1a artifact names into a constant
CodeRabbit's nit is valid:
("target-profile.md", "build-model.md", "codeql-plan.yml")is duplicated between the 1a check and the 1b skip list. A small module-level constant such as_PHASE_1A_ARTIFACT_NAMESwould make this safer. -
The runtime message slightly overstates the condition
The new graceful check uses
any(_path_is_fresh(...)), which I think is correct for the intended behavior: if some expected artifact was written, let validation and auto-repair run.But the message says "the required durable artifacts were already written", which sounds like all required artifacts exist. I would tweak it to something like:
but expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair. -
Add coverage for the actual
phase_1.pycontrol flowThe new tests cover
check_phase_graceful_completion()directly, which is useful. But issue #39 is specifically about the mid-turn cutoff path entering validation/auto-repair instead of stopping or retrying.I would add at least one test around
_run_subphase()/ the surrounding control flow to prove:- mid-turn cutoff + produced artifacts => graceful forgiveness => validation/repair path is reached
- mid-turn cutoff + no produced artifacts => existing iteration retry behavior remains unchanged
Otherwise the helper is tested, but not the behavior that closes the acceptance criteria.
-
Clarify or complete the Phase 1c behavior
The PR intentionally leaves 1c on the monolithic logic, but issue #39 says Phase 1a and 1c should also benefit from the same fix.
Current 1c still requires the full Phase 1 artifact set, sandbox state, and a fresh run summary. If a cutoff happens before the summary exists, 1c may still fail forgiveness even if
sandbox-plan.mdorsandbox/CODECOME-GENERATED.mdwas produced.I would either implement a 1c-specific graceful check, or explicitly document why 1c does not need it despite the issue acceptance criteria.
-
Minor test comment cleanup
In
test_phase_1b_graceful_completion_fails_if_no_1b_artifacts_fresh, the comment says it creates only Phase 1a artifacts, but the test also createssandbox-plan.mdfrom 1c. The logic is fine; the comment should say it creates non-1b artifacts.
Overall: the PR is small and points in the right direction, but I would tighten the artifact list sources and add one control-flow test before merging.
- Extract _PHASE_1A_ARTIFACT_NAMES constant, eliminate duplication
- Move PHASE_1B_REQUIRED_NOTES from artifact_checks.py to completion.py
(single source of truth for 1b artifact list)
- Add Phase 1c-specific graceful completion check (sandbox-plan.md or
CODECOME-GENERATED.md freshness, no monolith requirements)
- Fix runtime messages in phase_1.py and harness.py to not overstate
('expected durable artifacts were written' instead of 'required')
- Fix misleading test comment in graceful completion tests
- Add 2 control-flow tests proving mid-turn cutoff branching in
_run_subphase: forgiveness triggers validation, denial triggers retry
- Add 1c graceful test (fresh sandbox-plan passes, none fails)
|
@coderabbitai All previous review comments have been addressed:
618 tests pass. |
|
618 tests passing — looks good. Approving. (=^ ・ω・ ^=)∫ ✅ Action performedComments resolved. Approval is disabled; enable |
Problem (Issue #39)
When Phase 1b produces all artifacts but the model cuts off mid-turn, the harness never runs artifact validation or auto-repair. The graceful-forgiveness mechanism in
phase_1.py:675-695checkscheck_phase_graceful_completion()— but that function required ALL Phase 1 artifacts to exist, includingsandbox-plan.md(a Phase 1c artifact). Sincesandbox-plan.mdhadn't been created yet, forgiveness was denied,returncodestayed 2, and the validation blocks were skipped.Fix
check_phase_graceful_completion()intools/phases/completion.pynow checks only subphase-specific artifacts for phases 1a and 1b:target-profile.md,build-model.md,codeql-plan.ymltarget-profile.md,build-model.md,codeql-plan.yml) and Phase 1c (sandbox-plan.md) artifactsThis means when Phase 1b produces artifacts then cuts off, the graceful forgiveness triggers,
returncodestays 0, the validation blocks run, and the auto-repair loop can fix formatting issues (e.g., H2→H1 headings in threat-model.md) by resuming the session with a targeted repair prompt.Tests
6 new tests covering:
Verification
Summary by CodeRabbit
Tests
New Features
User-facing Messaging