Skip to content

fix(auto): clear stale stop reason codes#1194

Merged
shaun0927 merged 1 commit into
mainfrom
fix/auto-clear-stale-stop-reason-code
May 25, 2026
Merged

fix(auto): clear stale stop reason codes#1194
shaun0927 merged 1 commit into
mainfrom
fix/auto-clear-stale-stop-reason-code

Conversation

@Q00
Copy link
Copy Markdown
Owner

@Q00 Q00 commented May 23, 2026

Summary

  • Clear AutoPipelineState.last_error_code whenever the session transitions out of BLOCKED.
  • Add regression coverage for recovery and later hard-failure transitions so AutoPipelineResult.stop_reason_code cannot report a stale interview blocker.

Review context

This follows up the post-v0.39.1 stop_reason_code surface from PR #1151. The public result envelope copies state.last_error_code; without clearing it on non-blocked transitions, a resumed session could recover or fail for a different reason while still exposing an old canonical interview blocker code.

Validation

  • SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/auto/test_pipeline_watchdog_integration.py tests/unit/auto/test_surface.py -q -> 96 passed
  • SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto tests/unit/mcp/tools/test_generate_seed_force_flag.py tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/plugin tests/unit/runtime tests/canonical -q -> 1677 passed, 2 skipped
  • SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff check src/ouroboros/auto/state.py tests/unit/auto/test_stop_reason_code.py -> passed
  • SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff format --check src/ouroboros/auto/state.py tests/unit/auto/test_stop_reason_code.py -> passed
  • git diff --check -> passed

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit 6692560 for PR #1194

Review record: 98227064-ba83-4b50-b381-ad5e22234d2e

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

None.

Design Notes

Unable to complete the code review: every attempt to read /tmp/pr_diff_1194.patch, changed-file lists, or comments failed before execution because the shell sandbox cannot create its namespace in this environment.

Recovery Notes

First recoverable review artifact generated from codex analysis log.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

PR #1194
Branch: fix/auto-clear-stale-stop-reason-code | 2 files, +32/-0 | CI: Bridge TypeScript pass 15s https://github.com/Q00/ouroboros/actions/runs/26325011929/job/77501002797
Scope: architecture-level
HEAD checked: 6692560484d70139fb7ab877ad29214088aa500a

What Improved

  • AutoPipelineState.transition() now clears last_error_code whenever the session leaves BLOCKED, preventing stale canonical stop reasons from leaking onto recovered, failed, or completed results.
  • Added direct regression coverage for recovery and hard-failure transitions in tests/unit/auto/test_stop_reason_code.py.
  • Added broader runtime/CLI/plugin documentation and test surface around new backend and monitoring paths.

Issue #N/A Requirements

Requirement Status
Clear AutoPipelineState.last_error_code when the session transitions out of BLOCKED Satisfied
Add regression coverage for stale stop_reason_code after recovery/failure Satisfied
Preserve affected consumer contract surfaces while broad runtime/status changes are present Not satisfied: ouroboros status health diagnostics are truncated and current unit tests fail
Full current-HEAD unit suite passes Not satisfied

Prior Findings Status

Prior Finding Status
Prior review context MODIFIED — No prior review concerns were provided in this review context; none maintained, modified, or withdrawn.

Blockers

# File:Line Severity Confidence Finding
1 src/ouroboros/cli/commands/status.py:303 High 95% health now embeds all diagnostic detail into the name field, but that value is rendered through create_status_table(), whose Name column is no_wrap=True and gets ellipsized by Rich. As a result, the command hides the exact missing CLI path, credentials.yaml path, warning text, and effective runtime path; the current HEAD unit suite fails five health-contract tests (tests/unit/cli/test_status.py::{test_health_exits_nonzero_when_runtime_cli_missing,test_health_exits_nonzero_when_credentials_file_missing,test_health_warns_but_exits_zero_for_missing_database_and_empty_key,test_health_honors_agent_runtime_and_cli_path_environment_overrides,test_health_honors_effective_backend_configured_cli_when_runtime_env_overrides}). This is a user-facing diagnostics regression and a merge-blocking test failure; keep detail in a non-truncating column or render health rows as plain/key-value output.

Follow-ups

# File:Line Priority Confidence Suggestion
None.

Test Coverage

  • Passed: SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/mcp/tools/test_ac_tree_hud_handler.py tests/unit/providers/test_copilot_cli_adapter.py -q -> 102 passed.
  • Passed: targeted ruff check over the reviewed auto/Copilot/AC HUD files.
  • Failed: SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit -q -> 5 failed, 9519 passed, 7 skipped. All failures are in tests/unit/cli/test_status.py and match the health output truncation regression above.

Design / Roadmap Gate

Affected-boundary review covered the advertised auto state-machine/persistence/result boundary and the wider current-HEAD diff surfaces: CLI health diagnostics, runtime backend selection, Copilot adapter/runtime, OpenCode bridge, AC HUD MCP rendering, packaging, and CI. The advertised stop-reason state transition is sound, but the PR cannot pass the design gate because the CLI health boundary is a consumer contract: users rely on exact paths and warning details to repair installation/auth/runtime problems, and current HEAD renders those details through a truncating table column.

Merge Recommendation

Do not merge until the health output contract is fixed and tests/unit/cli/test_status.py plus the full unit suite pass on current HEAD.

Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: full
merge_eligible: false
head_sha: 6692560
source_read_ok: true
diff_read_ok: true
blocking_count: 0

Copy link
Copy Markdown
Collaborator

@shaun0927 shaun0927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-checked against #1151 (the merged stop_reason_code envelope) and #1157 (the ooo auto SSOT, which names this PR as the remaining L2 follow-up for stop-reason-code clearing). The two-line guard in state.transition() is the right scope: it clears last_error_code exactly when the session leaves BLOCKED, leaves the BLOCKED→BLOCKED self-edge untouched, and the two added tests pin both the recover() and mark_failed() paths. No regression risk vs. the documented last_error_code semantics.

FYI #1205 introduces a competing hunk at the same location with condition if error is None, which diverges on transition(BLOCKED, msg, error=None) and on non-BLOCKED→non-BLOCKED with an explicit error string — I left a change-request on #1205 to drop its hunk so this PR can own the contract cleanly. Land this before #1205 rebases.

shaun0927 added a commit that referenced this pull request May 25, 2026
Drops the five hunks called out in PR review against #1157 SSOT / merged
design decisions, while keeping every legitimate fix in this PR:

- Restore DeferredProbe.passed=True (#1181 contract: probe-PASS placeholder
  so the L3 verifier flags the gap without failing the grade).
- Restore `_inference.single` consumer call so unmatched ledgers still
  apply the safe LIBRARY fallback (#1177 + #1188 decision).
- Drop the `if error is None` last_error_code clearing — sibling PR #1194
  owns this hunk with the stricter `next_phase is not BLOCKED` condition.
- Restore `env.setdefault(...)` for the three plugin runtime env vars
  introduced by #1193 so downstream entrypoints can still pre-seed.
- Drop the conditional `active_task_class` MCP meta block — sibling PR
  #1196 owns it with unconditional null surfacing (clients need to tell
  ambiguous-inference apart from a missing protocol field).

Kept fixes: `_matches_web_service` independence, `defaulted_sections` CLI
rendering, `status --limit > 0` validation, mechanical-eval evidence
linkage, TimeoutExpired stdout/stderr bytes-safe decode, workflow
lifecycle same-timestamp restart ordering, MCP meta surfacing for
`interview_closure_mode` + `runtime_probe_evidence`, plugin-mode Ralph
`product_status=not_verified_complete` downgrade.

Verification:
- uv run pytest tests/unit/auto/test_pipeline_task_class_envelope.py
  tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py
  tests/unit/orchestrator/test_runtime_evidence.py
  tests/unit/cli/test_status_run.py tests/unit/cli/test_auto_command.py
  tests/unit/plugin/test_firewall.py
  tests/unit/orchestrator/test_workflow_lifecycle_events.py
  tests/integration/test_mechanical_eval_projection.py
  tests/unit/auto/test_stop_reason_code.py -q  →  243 passed
- uv run ruff check src tests  →  passed
- uv run ruff format --check src tests  →  passed
@shaun0927 shaun0927 merged commit f467311 into main May 25, 2026
8 checks passed
@shaun0927 shaun0927 deleted the fix/auto-clear-stale-stop-reason-code branch May 25, 2026 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants