fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39) by pruiz · Pull Request #40 · pruiz/CodeCome

pruiz · 2026-06-04T18:40:27Z

Problem (Issue #39)

When Phase 1b produces all artifacts but the model cuts off mid-turn, the harness never runs artifact validation or auto-repair. The graceful-forgiveness mechanism in phase_1.py:675-695 checks check_phase_graceful_completion() — but that function required ALL Phase 1 artifacts to exist, including sandbox-plan.md (a Phase 1c artifact). Since sandbox-plan.md hadn't been created yet, forgiveness was denied, returncode stayed 2, and the validation blocks were skipped.

Fix

check_phase_graceful_completion() in tools/phases/completion.py now checks only subphase-specific artifacts for phases 1a and 1b:

Phase 1a: checks target-profile.md, build-model.md, codeql-plan.yml
Phase 1b: checks all 1b notes, excluding Phase 1a (target-profile.md, build-model.md, codeql-plan.yml) and Phase 1c (sandbox-plan.md) artifacts
Phase 1c and bare "1": unchanged (existing monolith logic)

This means when Phase 1b produces artifacts then cuts off, the graceful forgiveness triggers, returncode stays 0, the validation blocks run, and the auto-repair loop can fix formatting issues (e.g., H2→H1 headings in threat-model.md) by resuming the session with a targeted repair prompt.

Tests

6 new tests covering:

1a graceful completion with only 1a artifacts
1a returns False with no fresh artifacts
1b graceful completion with only 1b artifacts (no sandbox-plan.md)
1b returns False with no fresh 1b artifacts
sandbox-plan.md is excluded from 1b check
1c still requires sandbox state

Verification

615 tests pass

Summary by CodeRabbit

Tests
- Added tests covering Phase 1 subphase graceful-completion (1a/1b/1c), mid-turn forgiveness retry behavior, and updated artifact-check tests to run in isolated temp filesystems.
New Features
- Phase 1 now evaluates graceful completion per subphase with specific artifact-freshness checks.
User-facing Messaging
- Clarified messaging when a mid-run cutoff is forgiven to indicate durable artifacts were written.

check_phase_graceful_completion now checks only subphase-specific artifacts for phases 1a and 1b, instead of requiring the full Phase 1 artifact set (which includes sandbox-plan.md from Phase 1c). Previously, when Phase 1b produced all its artifacts but the model cut off mid-turn, the graceful-forgiveness path in phase_1.py would call check_phase_graceful_completion('1b'), which required all Phase 1 artifacts including sandbox-plan.md (a 1c artifact). Since that file didn't exist yet, forgiveness was denied, returncode stayed 2, and the auto-repair/validation blocks never ran. The user would see 'Phase 1b did not complete cleanly (exit code 2)' even though all artifacts were correctly written to disk. Now Phase 1a only checks 1a artifacts, Phase 1b only checks 1b artifacts, and Phase 1c/bare '1' keep the existing monolith logic.

coderabbitai · 2026-06-04T18:40:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b3626fc9-ccaf-43c4-86d0-b2eb7822ed9c

📥 Commits

Reviewing files that changed from the base of the PR and between 34a8b0d and 79cd183.

📒 Files selected for processing (6)

tests/test_phase_1_mid_turn_forgiveness.py
tests/test_phase_graceful_completion_subphases.py
tools/codecome/harness.py
tools/codecome/phase_1.py
tools/phases/artifact_checks.py
tools/phases/completion.py

✅ Files skipped from review due to trivial changes (2)

tools/codecome/phase_1.py
tools/codecome/harness.py

📝 Walkthrough

Walkthrough

Adds subphase-specific graceful-completion checks for Phase 1 (1a, 1b, 1c) that evaluate artifact freshness per-subphase, moves Phase 1b required-note list usage into the completion module, updates two forgiveness text messages, and adds tests for subphase checks, mid-turn forgiveness, and CLI artifact checks using isolated tmp_path directories.

Changes

Phase 1 Subphase Graceful Completion

Layer / File(s)	Summary
Subphase-specific graceful completion implementation `tools/phases/completion.py`, `tools/phases/artifact_checks.py`	Adds PHASE_1A and PHASE_1B artifact-name sets and early-return freshness checks for `original_phase == "1a"
User-facing forgiveness message tweaks `tools/codecome/harness.py`, `tools/codecome/phase_1.py`	Adjusts informational strings emitted when a mid-turn graceful forgiveness path is taken; no control flow change.
Comprehensive subphase graceful completion tests `tests/test_phase_graceful_completion_subphases.py`	New tests validate Phase 1a/1b/1c freshness behavior using temporary `itemdb/notes`, controlling mtimes and sandbox presence to exercise fresh/stale edge cases.
Mid-turn cutoff forgiveness tests `tests/test_phase_1_mid_turn_forgiveness.py`	New tests exercise `_run_subphase` mid-turn cutoff retry behavior for 1b under granted and denied forgiveness scenarios using patched internals and fake runtime config.
Existing CLI tests updated for isolated filesystem execution `tests/test_phase_artifacts_cli.py`	Refactors three artifact-check CLI tests to use `tmp_path`, create `itemdb/notes` and `runs` under it, and patch `phases.artifact_checks.ROOT` so checks run in an isolated temp checkout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Run artifact validation + auto-repair after mid-turn model cutoffs (not only clean finishes) #39: Related to implementing mid-turn graceful forgiveness and subphase handling; this PR implements subphase-specific artifact freshness checks and mid-turn forgiveness tests that align with that issue.

Possibly related PRs

pruiz/CodeCome#34: Related — both modify Phase 1 artifact/validation handling and the source of Phase 1b required-note checks.

Poem

🐰 I hopped through files with careful paws,
Checking notes and sandbox laws—
1a, 1b, 1c in line,
Fresh artifacts mark the sign.
Tests penned tidy, messages neat, hooray for graceful cause!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing subphase-aware graceful completion logic after mid-turn cutoffs, which directly addresses the root cause described in the PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/mid-turn-graceful-forgiveness-subphases

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T18:42:17Z

Coverage Report

Metric	Value
Line Coverage	73.8%
Lines Covered	0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-04T19:32:34.338Z

coderabbitai

🧹 Nitpick comments (2)

tools/phases/completion.py (1)

125-135: ⚡ Quick win

Extract Phase 1a artifact names to a module-level constant.

The Phase 1a artifact list ("target-profile.md", "build-model.md", "codeql-plan.yml") appears twice: once in the Phase 1a check (line 127) and again in _SKIP_1B (line 132). This duplication means both locations must be updated if the Phase 1a artifact set changes.

♻️ Proposed refactor to eliminate duplication

 _PHASE1_REQUIRED_ARTIFACT_NAMES = [
     "target-profile.md",
     "attack-surface.md",
     "build-model.md",
     "execution-model.md",
     "trust-boundaries.md",
     "data-flow.md",
     "validation-model.md",
     "interesting-files.md",
     "file-risk-index.yml",
     "security-assumptions.md",
     "threat-model.md",
     "sandbox-plan.md",
 ]

+_PHASE_1A_ARTIFACT_NAMES = ("target-profile.md", "build-model.md", "codeql-plan.yml")
+

 
 def _phase1_required_artifacts() -> list[Path]:
     return [NOTES_ROOT / name for name in _PHASE1_REQUIRED_ARTIFACT_NAMES]

Then update both usages:

             if original_phase == "1a":
                 notes_dir = ROOT / "itemdb" / "notes"
-                paths_1a = [notes_dir / n for n in ("target-profile.md", "build-model.md", "codeql-plan.yml")]
+                paths_1a = [notes_dir / n for n in _PHASE_1A_ARTIFACT_NAMES]
                 return any(_path_is_fresh(p, run_start_time) for p in paths_1a)

             if original_phase == "1b":
                 notes_dir = ROOT / "itemdb" / "notes"
-                _SKIP_1B = frozenset({"target-profile.md", "build-model.md", "codeql-plan.yml", "sandbox-plan.md"})
+                _SKIP_1B = frozenset(set(_PHASE_1A_ARTIFACT_NAMES) | {"sandbox-plan.md"})
                 names_1b = [n for n in _PHASE1_REQUIRED_ARTIFACT_NAMES if n not in _SKIP_1B]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/phases/completion.py` around lines 125 - 135, The duplicated Phase 1a
artifact names should be pulled into a module-level constant (e.g.,
PHASE1A_ARTIFACT_NAMES) and both usages updated to reference it: replace the
inline tuple in the original_phase == "1a" branch (which builds paths_1a) with
PHASE1A_ARTIFACT_NAMES, and replace the literal set in _SKIP_1B with
frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and names_1b derive from the
single constant; keep the rest of the logic (calls to _path_is_fresh, names_1b
construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES) unchanged.

tests/test_phase_graceful_completion_subphases.py (1)

82-82: 💤 Low value

Clarify the comment to accurately describe what artifacts are created.

The comment states "Create only Phase 1a artifacts" but the code creates both target-profile.md (a Phase 1a artifact) and sandbox-plan.md (a Phase 1c artifact). While the test logic is correct (both are excluded from the Phase 1b check), the comment could be more precise.
📝 Proposed comment clarification
-    # Create only Phase 1a artifacts — which are excluded from 1b check
+    # Create only non-1b artifacts (1a + 1c) — which are excluded from 1b check
     now = time.time()
     (notes / "target-profile.md").write_text("", encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_phase_graceful_completion_subphases.py` at line 82, Update the
misleading comment on the test that currently reads "Create only Phase 1a
artifacts" to reflect that the test creates both a Phase 1a artifact and a Phase
1c artifact; mention the specific files created (`target-profile.md` for Phase
1a and `sandbox-plan.md` for Phase 1c) so it’s clear both are intentionally
excluded from the Phase 1b check in this test (look for the comment near the
artifact-creation code in tests/test_phase_graceful_completion_subphases.py).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_phase_graceful_completion_subphases.py`:
- Line 82: Update the misleading comment on the test that currently reads
"Create only Phase 1a artifacts" to reflect that the test creates both a Phase
1a artifact and a Phase 1c artifact; mention the specific files created
(`target-profile.md` for Phase 1a and `sandbox-plan.md` for Phase 1c) so it’s
clear both are intentionally excluded from the Phase 1b check in this test (look
for the comment near the artifact-creation code in
tests/test_phase_graceful_completion_subphases.py).

In `@tools/phases/completion.py`:
- Around line 125-135: The duplicated Phase 1a artifact names should be pulled
into a module-level constant (e.g., PHASE1A_ARTIFACT_NAMES) and both usages
updated to reference it: replace the inline tuple in the original_phase == "1a"
branch (which builds paths_1a) with PHASE1A_ARTIFACT_NAMES, and replace the
literal set in _SKIP_1B with frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and
names_1b derive from the single constant; keep the rest of the logic (calls to
_path_is_fresh, names_1b construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES)
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d096973-4a41-4c5f-bf1e-868d1c4a3b6d

📥 Commits

Reviewing files that changed from the base of the PR and between 10a9b54 and 34a8b0d.

📒 Files selected for processing (3)

tests/test_phase_artifacts_cli.py
tests/test_phase_graceful_completion_subphases.py
tools/phases/completion.py

Copilot

Pull request overview

This PR adjusts Phase 1 “graceful forgiveness” behavior so that mid-turn model cutoffs during subphases (especially 1b) can still proceed into the existing validation/auto-repair flow by checking subphase-specific artifacts rather than requiring the full Phase 1 monolith artifact set.

Changes:

Added Phase 1a/1b subphase-specific logic to check_phase_graceful_completion() so 1b no longer depends on Phase 1c artifacts like sandbox-plan.md.
Added a new test suite covering 1a/1b/1c graceful-completion behavior.
Updated CLI artifact-check tests to run against an isolated filesystem by patching ROOT to tmp_path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
tools/phases/completion.py	Implements subphase-aware graceful completion checks for Phase 1a/1b while keeping 1c/bare “1” on the existing monolith logic.
tests/test_phase_graceful_completion_subphases.py	Adds targeted tests for the new 1a/1b behavior and confirms 1c continues to use monolith requirements.
tests/test_phase_artifacts_cli.py	Improves test isolation by creating minimal directory structure in `tmp_path` and patching `phases.artifact_checks.ROOT`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pruiz

Reviewed the PR in read-only mode. The direction looks good and I think the core 1b fix is on the right track, but I would address a few things before merging:

Reuse the Phase 1b artifact source of truth

completion.py currently derives the 1b artifact set by filtering _PHASE1_REQUIRED_ARTIFACT_NAMES and excluding 1a + sandbox-plan.md. However artifact_checks.py already has PHASE_1B_REQUIRED_NOTES, which looks like the canonical 1b artifact list used by the validator.

I would prefer importing/reusing that list for the graceful-completion check, so the forgiveness logic and artifact validation cannot drift apart later.
Extract the Phase 1a artifact names into a constant

CodeRabbit's nit is valid: ("target-profile.md", "build-model.md", "codeql-plan.yml") is duplicated between the 1a check and the 1b skip list. A small module-level constant such as _PHASE_1A_ARTIFACT_NAMES would make this safer.
The runtime message slightly overstates the condition

The new graceful check uses any(_path_is_fresh(...)), which I think is correct for the intended behavior: if some expected artifact was written, let validation and auto-repair run.

But the message says "the required durable artifacts were already written", which sounds like all required artifacts exist. I would tweak it to something like:
```
but expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair.
```
Add coverage for the actual phase_1.py control flow

The new tests cover check_phase_graceful_completion() directly, which is useful. But issue #39 is specifically about the mid-turn cutoff path entering validation/auto-repair instead of stopping or retrying.

I would add at least one test around _run_subphase() / the surrounding control flow to prove:
- mid-turn cutoff + produced artifacts => graceful forgiveness => validation/repair path is reached
- mid-turn cutoff + no produced artifacts => existing iteration retry behavior remains unchanged
Otherwise the helper is tested, but not the behavior that closes the acceptance criteria.
Clarify or complete the Phase 1c behavior

The PR intentionally leaves 1c on the monolithic logic, but issue #39 says Phase 1a and 1c should also benefit from the same fix.

Current 1c still requires the full Phase 1 artifact set, sandbox state, and a fresh run summary. If a cutoff happens before the summary exists, 1c may still fail forgiveness even if sandbox-plan.md or sandbox/CODECOME-GENERATED.md was produced.

I would either implement a 1c-specific graceful check, or explicitly document why 1c does not need it despite the issue acceptance criteria.
Minor test comment cleanup

In test_phase_1b_graceful_completion_fails_if_no_1b_artifacts_fresh, the comment says it creates only Phase 1a artifacts, but the test also creates sandbox-plan.md from 1c. The logic is fine; the comment should say it creates non-1b artifacts.

Overall: the PR is small and points in the right direction, but I would tighten the artifact list sources and add one control-flow test before merging.

- Extract _PHASE_1A_ARTIFACT_NAMES constant, eliminate duplication - Move PHASE_1B_REQUIRED_NOTES from artifact_checks.py to completion.py (single source of truth for 1b artifact list) - Add Phase 1c-specific graceful completion check (sandbox-plan.md or CODECOME-GENERATED.md freshness, no monolith requirements) - Fix runtime messages in phase_1.py and harness.py to not overstate ('expected durable artifacts were written' instead of 'required') - Fix misleading test comment in graceful completion tests - Add 2 control-flow tests proving mid-turn cutoff branching in _run_subphase: forgiveness triggers validation, denial triggers retry - Add 1c graceful test (fresh sandbox-plan passes, none fails)

pruiz · 2026-06-04T19:36:26Z

@coderabbitai All previous review comments have been addressed:

Duplicated Phase 1a artifact names → Extracted _PHASE_1A_ARTIFACT_NAMES constant, no more duplication.
Test comment accuracy → Fixed to "non-1b artifacts (1a + 1c) — excluded from 1b check".
Phase 1b artifact source of truth → Moved PHASE_1B_REQUIRED_NOTES to completion.py, artifact_checks.py imports it.
Runtime message overstatement → Changed to "expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair."
Control-flow tests → Added tests/test_phase_1_mid_turn_forgiveness.py proving mid-turn branching.
Phase 1c graceful check → Added subphase-specific branch checking sandbox-plan.md or CODECOME-GENERATED.md freshness.

618 tests pass.

coderabbitai · 2026-06-04T19:36:59Z

@pruiz Thanks for the thorough follow-through — every item from the review is cleanly addressed:

✅ _PHASE_1A_ARTIFACT_NAMES constant extracted, eliminating duplication.
✅ Test comment corrected to "non-1b artifacts (1a + 1c) — excluded from 1b check".
✅ PHASE_1B_REQUIRED_NOTES centralized in completion.py; artifact_checks.py now imports it.
✅ Runtime message softened to "expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair."
✅ Control-flow tests in test_phase_1_mid_turn_forgiveness.py cover both the forgiveness → validation path and the denial → non-zero path.
✅ Phase 1c graceful check branch added, accepting fresh sandbox-plan.md or CODECOME-GENERATED.md.

618 tests passing — looks good. Approving.

(=^ ･ω･ ^=)∫

✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

pruiz requested review from Copilot June 4, 2026 18:57

Copilot started reviewing on behalf of pruiz June 4, 2026 18:57 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread tools/phases/completion.py

Comment thread tools/phases/completion.py

pruiz commented Jun 4, 2026

View reviewed changes

pruiz merged commit da1eb4a into master Jun 4, 2026
7 checks passed

This was referenced Jun 5, 2026

feat: RenderOutput semantic console abstraction #44

Merged

WIP: Harness resilience — relaxed phase-2 gate, enriched resume diagnostics, timestamped summaries #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39)#40

fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39)#40
pruiz merged 2 commits into
masterfrom
fix/mid-turn-graceful-forgiveness-subphases

pruiz commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

pruiz left a comment

Uh oh!

pruiz commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pruiz commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem (Issue #39)

Fix

Tests

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

pruiz left a comment

Choose a reason for hiding this comment

Uh oh!

pruiz commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pruiz commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading