Skip to content

fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39)#40

Merged
pruiz merged 2 commits into
masterfrom
fix/mid-turn-graceful-forgiveness-subphases
Jun 4, 2026
Merged

fix: subphase-aware graceful completion after mid-turn cutoffs (fixes #39)#40
pruiz merged 2 commits into
masterfrom
fix/mid-turn-graceful-forgiveness-subphases

Conversation

@pruiz
Copy link
Copy Markdown
Owner

@pruiz pruiz commented Jun 4, 2026

Problem (Issue #39)

When Phase 1b produces all artifacts but the model cuts off mid-turn, the harness never runs artifact validation or auto-repair. The graceful-forgiveness mechanism in phase_1.py:675-695 checks check_phase_graceful_completion() — but that function required ALL Phase 1 artifacts to exist, including sandbox-plan.md (a Phase 1c artifact). Since sandbox-plan.md hadn't been created yet, forgiveness was denied, returncode stayed 2, and the validation blocks were skipped.

Fix

check_phase_graceful_completion() in tools/phases/completion.py now checks only subphase-specific artifacts for phases 1a and 1b:

  • Phase 1a: checks target-profile.md, build-model.md, codeql-plan.yml
  • Phase 1b: checks all 1b notes, excluding Phase 1a (target-profile.md, build-model.md, codeql-plan.yml) and Phase 1c (sandbox-plan.md) artifacts
  • Phase 1c and bare "1": unchanged (existing monolith logic)

This means when Phase 1b produces artifacts then cuts off, the graceful forgiveness triggers, returncode stays 0, the validation blocks run, and the auto-repair loop can fix formatting issues (e.g., H2→H1 headings in threat-model.md) by resuming the session with a targeted repair prompt.

Tests

6 new tests covering:

  • 1a graceful completion with only 1a artifacts
  • 1a returns False with no fresh artifacts
  • 1b graceful completion with only 1b artifacts (no sandbox-plan.md)
  • 1b returns False with no fresh 1b artifacts
  • sandbox-plan.md is excluded from 1b check
  • 1c still requires sandbox state

Verification

  • 615 tests pass

Summary by CodeRabbit

  • Tests

    • Added tests covering Phase 1 subphase graceful-completion (1a/1b/1c), mid-turn forgiveness retry behavior, and updated artifact-check tests to run in isolated temp filesystems.
  • New Features

    • Phase 1 now evaluates graceful completion per subphase with specific artifact-freshness checks.
  • User-facing Messaging

    • Clarified messaging when a mid-run cutoff is forgiven to indicate durable artifacts were written.

check_phase_graceful_completion now checks only subphase-specific
artifacts for phases 1a and 1b, instead of requiring the full Phase 1
artifact set (which includes sandbox-plan.md from Phase 1c).

Previously, when Phase 1b produced all its artifacts but the model cut
off mid-turn, the graceful-forgiveness path in phase_1.py would call
check_phase_graceful_completion('1b'), which required all Phase 1
artifacts including sandbox-plan.md (a 1c artifact). Since that file
didn't exist yet, forgiveness was denied, returncode stayed 2, and the
auto-repair/validation blocks never ran. The user would see 'Phase 1b
did not complete cleanly (exit code 2)' even though all artifacts were
correctly written to disk.

Now Phase 1a only checks 1a artifacts, Phase 1b only checks 1b
artifacts, and Phase 1c/bare '1' keep the existing monolith logic.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b3626fc9-ccaf-43c4-86d0-b2eb7822ed9c

📥 Commits

Reviewing files that changed from the base of the PR and between 34a8b0d and 79cd183.

📒 Files selected for processing (6)
  • tests/test_phase_1_mid_turn_forgiveness.py
  • tests/test_phase_graceful_completion_subphases.py
  • tools/codecome/harness.py
  • tools/codecome/phase_1.py
  • tools/phases/artifact_checks.py
  • tools/phases/completion.py
✅ Files skipped from review due to trivial changes (2)
  • tools/codecome/phase_1.py
  • tools/codecome/harness.py

📝 Walkthrough

Walkthrough

Adds subphase-specific graceful-completion checks for Phase 1 (1a, 1b, 1c) that evaluate artifact freshness per-subphase, moves Phase 1b required-note list usage into the completion module, updates two forgiveness text messages, and adds tests for subphase checks, mid-turn forgiveness, and CLI artifact checks using isolated tmp_path directories.

Changes

Phase 1 Subphase Graceful Completion

Layer / File(s) Summary
Subphase-specific graceful completion implementation
tools/phases/completion.py, tools/phases/artifact_checks.py
Adds PHASE_1A and PHASE_1B artifact-name sets and early-return freshness checks for `original_phase == "1a"
User-facing forgiveness message tweaks
tools/codecome/harness.py, tools/codecome/phase_1.py
Adjusts informational strings emitted when a mid-turn graceful forgiveness path is taken; no control flow change.
Comprehensive subphase graceful completion tests
tests/test_phase_graceful_completion_subphases.py
New tests validate Phase 1a/1b/1c freshness behavior using temporary itemdb/notes, controlling mtimes and sandbox presence to exercise fresh/stale edge cases.
Mid-turn cutoff forgiveness tests
tests/test_phase_1_mid_turn_forgiveness.py
New tests exercise _run_subphase mid-turn cutoff retry behavior for 1b under granted and denied forgiveness scenarios using patched internals and fake runtime config.
Existing CLI tests updated for isolated filesystem execution
tests/test_phase_artifacts_cli.py
Refactors three artifact-check CLI tests to use tmp_path, create itemdb/notes and runs under it, and patch phases.artifact_checks.ROOT so checks run in an isolated temp checkout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • pruiz/CodeCome#34: Related — both modify Phase 1 artifact/validation handling and the source of Phase 1b required-note checks.

Poem

🐰 I hopped through files with careful paws,
Checking notes and sandbox laws—
1a, 1b, 1c in line,
Fresh artifacts mark the sign.
Tests penned tidy, messages neat, hooray for graceful cause!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing subphase-aware graceful completion logic after mid-turn cutoffs, which directly addresses the root cause described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mid-turn-graceful-forgiveness-subphases

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Coverage Report

Metric Value
Line Coverage 73.8%
Lines Covered 0 / 0

Download detailed HTML coverage reports per OS/Python from the workflow artifacts.

Generated by pytest-cov on 2026-06-04T19:32:34.338Z

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tools/phases/completion.py (1)

125-135: ⚡ Quick win

Extract Phase 1a artifact names to a module-level constant.

The Phase 1a artifact list ("target-profile.md", "build-model.md", "codeql-plan.yml") appears twice: once in the Phase 1a check (line 127) and again in _SKIP_1B (line 132). This duplication means both locations must be updated if the Phase 1a artifact set changes.

♻️ Proposed refactor to eliminate duplication
 _PHASE1_REQUIRED_ARTIFACT_NAMES = [
     "target-profile.md",
     "attack-surface.md",
     "build-model.md",
     "execution-model.md",
     "trust-boundaries.md",
     "data-flow.md",
     "validation-model.md",
     "interesting-files.md",
     "file-risk-index.yml",
     "security-assumptions.md",
     "threat-model.md",
     "sandbox-plan.md",
 ]

+_PHASE_1A_ARTIFACT_NAMES = ("target-profile.md", "build-model.md", "codeql-plan.yml")
+

 
 def _phase1_required_artifacts() -> list[Path]:
     return [NOTES_ROOT / name for name in _PHASE1_REQUIRED_ARTIFACT_NAMES]

Then update both usages:

             if original_phase == "1a":
                 notes_dir = ROOT / "itemdb" / "notes"
-                paths_1a = [notes_dir / n for n in ("target-profile.md", "build-model.md", "codeql-plan.yml")]
+                paths_1a = [notes_dir / n for n in _PHASE_1A_ARTIFACT_NAMES]
                 return any(_path_is_fresh(p, run_start_time) for p in paths_1a)

             if original_phase == "1b":
                 notes_dir = ROOT / "itemdb" / "notes"
-                _SKIP_1B = frozenset({"target-profile.md", "build-model.md", "codeql-plan.yml", "sandbox-plan.md"})
+                _SKIP_1B = frozenset(set(_PHASE_1A_ARTIFACT_NAMES) | {"sandbox-plan.md"})
                 names_1b = [n for n in _PHASE1_REQUIRED_ARTIFACT_NAMES if n not in _SKIP_1B]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/phases/completion.py` around lines 125 - 135, The duplicated Phase 1a
artifact names should be pulled into a module-level constant (e.g.,
PHASE1A_ARTIFACT_NAMES) and both usages updated to reference it: replace the
inline tuple in the original_phase == "1a" branch (which builds paths_1a) with
PHASE1A_ARTIFACT_NAMES, and replace the literal set in _SKIP_1B with
frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and names_1b derive from the
single constant; keep the rest of the logic (calls to _path_is_fresh, names_1b
construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES) unchanged.
tests/test_phase_graceful_completion_subphases.py (1)

82-82: 💤 Low value

Clarify the comment to accurately describe what artifacts are created.

The comment states "Create only Phase 1a artifacts" but the code creates both target-profile.md (a Phase 1a artifact) and sandbox-plan.md (a Phase 1c artifact). While the test logic is correct (both are excluded from the Phase 1b check), the comment could be more precise.

📝 Proposed comment clarification
-    # Create only Phase 1a artifacts — which are excluded from 1b check
+    # Create only non-1b artifacts (1a + 1c) — which are excluded from 1b check
     now = time.time()
     (notes / "target-profile.md").write_text("", encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_phase_graceful_completion_subphases.py` at line 82, Update the
misleading comment on the test that currently reads "Create only Phase 1a
artifacts" to reflect that the test creates both a Phase 1a artifact and a Phase
1c artifact; mention the specific files created (`target-profile.md` for Phase
1a and `sandbox-plan.md` for Phase 1c) so it’s clear both are intentionally
excluded from the Phase 1b check in this test (look for the comment near the
artifact-creation code in tests/test_phase_graceful_completion_subphases.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_phase_graceful_completion_subphases.py`:
- Line 82: Update the misleading comment on the test that currently reads
"Create only Phase 1a artifacts" to reflect that the test creates both a Phase
1a artifact and a Phase 1c artifact; mention the specific files created
(`target-profile.md` for Phase 1a and `sandbox-plan.md` for Phase 1c) so it’s
clear both are intentionally excluded from the Phase 1b check in this test (look
for the comment near the artifact-creation code in
tests/test_phase_graceful_completion_subphases.py).

In `@tools/phases/completion.py`:
- Around line 125-135: The duplicated Phase 1a artifact names should be pulled
into a module-level constant (e.g., PHASE1A_ARTIFACT_NAMES) and both usages
updated to reference it: replace the inline tuple in the original_phase == "1a"
branch (which builds paths_1a) with PHASE1A_ARTIFACT_NAMES, and replace the
literal set in _SKIP_1B with frozenset(PHASE1A_ARTIFACT_NAMES) so _SKIP_1B and
names_1b derive from the single constant; keep the rest of the logic (calls to
_path_is_fresh, names_1b construction, and _PHASE1_REQUIRED_ARTIFACT_NAMES)
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d096973-4a41-4c5f-bf1e-868d1c4a3b6d

📥 Commits

Reviewing files that changed from the base of the PR and between 10a9b54 and 34a8b0d.

📒 Files selected for processing (3)
  • tests/test_phase_artifacts_cli.py
  • tests/test_phase_graceful_completion_subphases.py
  • tools/phases/completion.py

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts Phase 1 “graceful forgiveness” behavior so that mid-turn model cutoffs during subphases (especially 1b) can still proceed into the existing validation/auto-repair flow by checking subphase-specific artifacts rather than requiring the full Phase 1 monolith artifact set.

Changes:

  • Added Phase 1a/1b subphase-specific logic to check_phase_graceful_completion() so 1b no longer depends on Phase 1c artifacts like sandbox-plan.md.
  • Added a new test suite covering 1a/1b/1c graceful-completion behavior.
  • Updated CLI artifact-check tests to run against an isolated filesystem by patching ROOT to tmp_path.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tools/phases/completion.py Implements subphase-aware graceful completion checks for Phase 1a/1b while keeping 1c/bare “1” on the existing monolith logic.
tests/test_phase_graceful_completion_subphases.py Adds targeted tests for the new 1a/1b behavior and confirms 1c continues to use monolith requirements.
tests/test_phase_artifacts_cli.py Improves test isolation by creating minimal directory structure in tmp_path and patching phases.artifact_checks.ROOT.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/phases/completion.py
Comment thread tools/phases/completion.py
Copy link
Copy Markdown
Owner Author

@pruiz pruiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR in read-only mode. The direction looks good and I think the core 1b fix is on the right track, but I would address a few things before merging:

  1. Reuse the Phase 1b artifact source of truth

    completion.py currently derives the 1b artifact set by filtering _PHASE1_REQUIRED_ARTIFACT_NAMES and excluding 1a + sandbox-plan.md. However artifact_checks.py already has PHASE_1B_REQUIRED_NOTES, which looks like the canonical 1b artifact list used by the validator.

    I would prefer importing/reusing that list for the graceful-completion check, so the forgiveness logic and artifact validation cannot drift apart later.

  2. Extract the Phase 1a artifact names into a constant

    CodeRabbit's nit is valid: ("target-profile.md", "build-model.md", "codeql-plan.yml") is duplicated between the 1a check and the 1b skip list. A small module-level constant such as _PHASE_1A_ARTIFACT_NAMES would make this safer.

  3. The runtime message slightly overstates the condition

    The new graceful check uses any(_path_is_fresh(...)), which I think is correct for the intended behavior: if some expected artifact was written, let validation and auto-repair run.

    But the message says "the required durable artifacts were already written", which sounds like all required artifacts exist. I would tweak it to something like:

    but expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair.
    
  4. Add coverage for the actual phase_1.py control flow

    The new tests cover check_phase_graceful_completion() directly, which is useful. But issue #39 is specifically about the mid-turn cutoff path entering validation/auto-repair instead of stopping or retrying.

    I would add at least one test around _run_subphase() / the surrounding control flow to prove:

    • mid-turn cutoff + produced artifacts => graceful forgiveness => validation/repair path is reached
    • mid-turn cutoff + no produced artifacts => existing iteration retry behavior remains unchanged

    Otherwise the helper is tested, but not the behavior that closes the acceptance criteria.

  5. Clarify or complete the Phase 1c behavior

    The PR intentionally leaves 1c on the monolithic logic, but issue #39 says Phase 1a and 1c should also benefit from the same fix.

    Current 1c still requires the full Phase 1 artifact set, sandbox state, and a fresh run summary. If a cutoff happens before the summary exists, 1c may still fail forgiveness even if sandbox-plan.md or sandbox/CODECOME-GENERATED.md was produced.

    I would either implement a 1c-specific graceful check, or explicitly document why 1c does not need it despite the issue acceptance criteria.

  6. Minor test comment cleanup

    In test_phase_1b_graceful_completion_fails_if_no_1b_artifacts_fresh, the comment says it creates only Phase 1a artifacts, but the test also creates sandbox-plan.md from 1c. The logic is fine; the comment should say it creates non-1b artifacts.

Overall: the PR is small and points in the right direction, but I would tighten the artifact list sources and add one control-flow test before merging.

- Extract _PHASE_1A_ARTIFACT_NAMES constant, eliminate duplication
- Move PHASE_1B_REQUIRED_NOTES from artifact_checks.py to completion.py
  (single source of truth for 1b artifact list)
- Add Phase 1c-specific graceful completion check (sandbox-plan.md or
  CODECOME-GENERATED.md freshness, no monolith requirements)
- Fix runtime messages in phase_1.py and harness.py to not overstate
  ('expected durable artifacts were written' instead of 'required')
- Fix misleading test comment in graceful completion tests
- Add 2 control-flow tests proving mid-turn cutoff branching in
  _run_subphase: forgiveness triggers validation, denial triggers retry
- Add 1c graceful test (fresh sandbox-plan passes, none fails)
@pruiz
Copy link
Copy Markdown
Owner Author

pruiz commented Jun 4, 2026

@coderabbitai All previous review comments have been addressed:

  1. Duplicated Phase 1a artifact names → Extracted _PHASE_1A_ARTIFACT_NAMES constant, no more duplication.
  2. Test comment accuracy → Fixed to "non-1b artifacts (1a + 1c) — excluded from 1b check".
  3. Phase 1b artifact source of truth → Moved PHASE_1B_REQUIRED_NOTES to completion.py, artifact_checks.py imports it.
  4. Runtime message overstatement → Changed to "expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair."
  5. Control-flow tests → Added tests/test_phase_1_mid_turn_forgiveness.py proving mid-turn branching.
  6. Phase 1c graceful check → Added subphase-specific branch checking sandbox-plan.md or CODECOME-GENERATED.md freshness.

618 tests pass.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

@pruiz Thanks for the thorough follow-through — every item from the review is cleanly addressed:

  1. _PHASE_1A_ARTIFACT_NAMES constant extracted, eliminating duplication.
  2. ✅ Test comment corrected to "non-1b artifacts (1a + 1c) — excluded from 1b check".
  3. PHASE_1B_REQUIRED_NOTES centralized in completion.py; artifact_checks.py now imports it.
  4. ✅ Runtime message softened to "expected durable artifacts were written during the run. Treating the subphase as complete enough to run validation and auto-repair."
  5. ✅ Control-flow tests in test_phase_1_mid_turn_forgiveness.py cover both the forgiveness → validation path and the denial → non-zero path.
  6. ✅ Phase 1c graceful check branch added, accepting fresh sandbox-plan.md or CODECOME-GENERATED.md.

618 tests passing — looks good. Approving.

(=^ ・ω・ ^=)∫

✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@pruiz pruiz merged commit da1eb4a into master Jun 4, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants