fix(auto): clear stale stop reason codes#1194
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
6692560for PR #1194
Review record:
98227064-ba83-4b50-b381-ad5e22234d2e
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
Unable to complete the code review: every attempt to read /tmp/pr_diff_1194.patch, changed-file lists, or comments failed before execution because the shell sandbox cannot create its namespace in this environment.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
PR #1194
Branch: fix/auto-clear-stale-stop-reason-code | 2 files, +32/-0 | CI: Bridge TypeScript pass 15s https://github.com/Q00/ouroboros/actions/runs/26325011929/job/77501002797
Scope: architecture-level
HEAD checked: 6692560484d70139fb7ab877ad29214088aa500a
What Improved
AutoPipelineState.transition()now clearslast_error_codewhenever the session leavesBLOCKED, preventing stale canonical stop reasons from leaking onto recovered, failed, or completed results.- Added direct regression coverage for recovery and hard-failure transitions in
tests/unit/auto/test_stop_reason_code.py. - Added broader runtime/CLI/plugin documentation and test surface around new backend and monitoring paths.
Issue #N/A Requirements
| Requirement | Status |
|---|---|
Clear AutoPipelineState.last_error_code when the session transitions out of BLOCKED |
Satisfied |
Add regression coverage for stale stop_reason_code after recovery/failure |
Satisfied |
| Preserve affected consumer contract surfaces while broad runtime/status changes are present | Not satisfied: ouroboros status health diagnostics are truncated and current unit tests fail |
| Full current-HEAD unit suite passes | Not satisfied |
Prior Findings Status
| Prior Finding | Status |
|---|---|
| Prior review context | MODIFIED — No prior review concerns were provided in this review context; none maintained, modified, or withdrawn. |
Blockers
| # | File:Line | Severity | Confidence | Finding |
|---|---|---|---|---|
| 1 | src/ouroboros/cli/commands/status.py:303 | High | 95% | health now embeds all diagnostic detail into the name field, but that value is rendered through create_status_table(), whose Name column is no_wrap=True and gets ellipsized by Rich. As a result, the command hides the exact missing CLI path, credentials.yaml path, warning text, and effective runtime path; the current HEAD unit suite fails five health-contract tests (tests/unit/cli/test_status.py::{test_health_exits_nonzero_when_runtime_cli_missing,test_health_exits_nonzero_when_credentials_file_missing,test_health_warns_but_exits_zero_for_missing_database_and_empty_key,test_health_honors_agent_runtime_and_cli_path_environment_overrides,test_health_honors_effective_backend_configured_cli_when_runtime_env_overrides}). This is a user-facing diagnostics regression and a merge-blocking test failure; keep detail in a non-truncating column or render health rows as plain/key-value output. |
Follow-ups
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| — | — | — | — | None. |
Test Coverage
- Passed:
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/mcp/tools/test_ac_tree_hud_handler.py tests/unit/providers/test_copilot_cli_adapter.py -q-> 102 passed. - Passed: targeted
ruff checkover the reviewed auto/Copilot/AC HUD files. - Failed:
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit -q-> 5 failed, 9519 passed, 7 skipped. All failures are intests/unit/cli/test_status.pyand match the health output truncation regression above.
Design / Roadmap Gate
Affected-boundary review covered the advertised auto state-machine/persistence/result boundary and the wider current-HEAD diff surfaces: CLI health diagnostics, runtime backend selection, Copilot adapter/runtime, OpenCode bridge, AC HUD MCP rendering, packaging, and CI. The advertised stop-reason state transition is sound, but the PR cannot pass the design gate because the CLI health boundary is a consumer contract: users rely on exact paths and warning details to repair installation/auth/runtime problems, and current HEAD renders those details through a truncating table column.
Merge Recommendation
Do not merge until the health output contract is fixed and tests/unit/cli/test_status.py plus the full unit suite pass on current HEAD.
Review-Metadata:
verdict: REQUEST_CHANGES
github_event: COMMENT
review_kind: full
merge_eligible: false
head_sha: 6692560
source_read_ok: true
diff_read_ok: true
blocking_count: 0
shaun0927
left a comment
There was a problem hiding this comment.
Cross-checked against #1151 (the merged stop_reason_code envelope) and #1157 (the ooo auto SSOT, which names this PR as the remaining L2 follow-up for stop-reason-code clearing). The two-line guard in state.transition() is the right scope: it clears last_error_code exactly when the session leaves BLOCKED, leaves the BLOCKED→BLOCKED self-edge untouched, and the two added tests pin both the recover() and mark_failed() paths. No regression risk vs. the documented last_error_code semantics.
FYI #1205 introduces a competing hunk at the same location with condition if error is None, which diverges on transition(BLOCKED, msg, error=None) and on non-BLOCKED→non-BLOCKED with an explicit error string — I left a change-request on #1205 to drop its hunk so this PR can own the contract cleanly. Land this before #1205 rebases.
Drops the five hunks called out in PR review against #1157 SSOT / merged design decisions, while keeping every legitimate fix in this PR: - Restore DeferredProbe.passed=True (#1181 contract: probe-PASS placeholder so the L3 verifier flags the gap without failing the grade). - Restore `_inference.single` consumer call so unmatched ledgers still apply the safe LIBRARY fallback (#1177 + #1188 decision). - Drop the `if error is None` last_error_code clearing — sibling PR #1194 owns this hunk with the stricter `next_phase is not BLOCKED` condition. - Restore `env.setdefault(...)` for the three plugin runtime env vars introduced by #1193 so downstream entrypoints can still pre-seed. - Drop the conditional `active_task_class` MCP meta block — sibling PR #1196 owns it with unconditional null surfacing (clients need to tell ambiguous-inference apart from a missing protocol field). Kept fixes: `_matches_web_service` independence, `defaulted_sections` CLI rendering, `status --limit > 0` validation, mechanical-eval evidence linkage, TimeoutExpired stdout/stderr bytes-safe decode, workflow lifecycle same-timestamp restart ordering, MCP meta surfacing for `interview_closure_mode` + `runtime_probe_evidence`, plugin-mode Ralph `product_status=not_verified_complete` downgrade. Verification: - uv run pytest tests/unit/auto/test_pipeline_task_class_envelope.py tests/unit/auto/test_domain_inference.py tests/unit/auto/test_surface.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/cli/test_status_run.py tests/unit/cli/test_auto_command.py tests/unit/plugin/test_firewall.py tests/unit/orchestrator/test_workflow_lifecycle_events.py tests/integration/test_mechanical_eval_projection.py tests/unit/auto/test_stop_reason_code.py -q → 243 passed - uv run ruff check src tests → passed - uv run ruff format --check src tests → passed
Summary
AutoPipelineState.last_error_codewhenever the session transitions out ofBLOCKED.AutoPipelineResult.stop_reason_codecannot report a stale interview blocker.Review context
This follows up the post-v0.39.1
stop_reason_codesurface from PR #1151. The public result envelope copiesstate.last_error_code; without clearing it on non-blocked transitions, a resumed session could recover or fail for a different reason while still exposing an old canonical interview blocker code.Validation
SETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto/test_stop_reason_code.py tests/unit/auto/test_pipeline_watchdog_integration.py tests/unit/auto/test_surface.py -q-> 96 passedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run pytest tests/unit/auto tests/unit/mcp/tools/test_generate_seed_force_flag.py tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_runtime_evidence.py tests/unit/plugin tests/unit/runtime tests/canonical -q-> 1677 passed, 2 skippedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff check src/ouroboros/auto/state.py tests/unit/auto/test_stop_reason_code.py-> passedSETUPTOOLS_SCM_PRETEND_VERSION=0.39.2 uv run ruff format --check src/ouroboros/auto/state.py tests/unit/auto/test_stop_reason_code.py-> passedgit diff --check-> passed