fix(agent-harness): give raw security-policy blocks a workaround + relay directive (#4094)#4478
Conversation
…lay directive (tinyhumansai#4094) The pluggable-ToolPolicy and channel-permission denials were given a structured "Blocked / Reason / Workaround / relay" message by tinyhumansai#4443, but the ~20 `[policy-blocked]` denials emitted deep in `SecurityPolicy` and the tools themselves (read-only autonomy, command classification, path checks, per-tool guards) still returned a bare marker line with no workaround and no relay instruction — so the agent dead-ends on them with a diagnosis and no path forward. Enrich those raw denials at a single seam: - Add `PolicyDenial::SecurityPolicyBlocked { tool, raw_reason }` and a `maybe_enrich_policy_block` helper that wraps a marker-bearing result into the same structured shape (keeps the reason, appends a concrete workaround — raise the agent-access tier / use a permitted alternative — and the relay-to-the-user directive). The `[policy-blocked]` marker is preserved so failure classification and the repeated-failure breaker still match. - Call it at the top of `ToolOutcomeCaptureMiddleware::after_tool` (the last after_tool hook, so the enriched text is what the transcript keeps). Already-structured ToolPolicyMiddleware denials (which carry a `Workaround:` suffix) are left untouched — no double-wrapping. Tests: the new variant keeps the marker + adds workaround/relay; the helper enriches raw blocks, skips already-structured denials and plain errors; a middleware test drives a raw read-only-mode block through `after_tool` and asserts the enrichment. Claude-Session: https://claude.ai/code/session_01KcmdqJVpjmnH31HqTHRLwG
Unrelated CI unblock: tinyhumansai#4446 (Privacy Mode) registered `config_get_privacy_mode` / `config_set_privacy_mode` unconditionally but did not update the `worker_a_controller_schemas_are_fully_exposed` golden, so that test fails on main for every PR without the fix. Add the two methods at their sorted positions (same fix as tinyhumansai#4475) so this PR's Rust E2E / coverage jobs go green. Claude-Session: https://claude.ai/code/session_01KcmdqJVpjmnH31HqTHRLwG
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 40 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6574b034be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Some(enriched) = | ||
| super::policy_denial::maybe_enrich_policy_block(&result.name, &result.content) | ||
| { |
There was a problem hiding this comment.
Gate policy-block enrichment to failed tool results
When a successful tool returns content containing the literal [policy-blocked] marker, this unconditional enrichment rewrites the success payload into a fake security-policy denial before success is computed. This is reachable with file_read, which returns file contents verbatim on success, so reading source/docs that mention the marker corrupts the transcript while the call is still recorded as successful; only actual failed, marker-prefixed policy errors should be enriched.
Useful? React with 👍 / 👎.
…rge) Two tests came in red from upstream/main via the merge (upstream main is itself red on these — its own PR CI fails on Rust Core Coverage); our changes don't touch the code they exercise: - tinyagents adapter_inventory_registers_model_tools_and_middleware: recent upstream middleware-stack churn produces 13 lifecycle middleware, but the literal still said 14 (its own enumeration lists 13). Match the code. - monitor_agent_e2e orchestrator_gets_denial_when_monitor_command_violates_policy: tinyhumansai#4478 reworded the security-policy denial to "[policy-blocked] Tool '<tool>' was blocked by the security policy"; the test still asserted the old contiguous "[policy-blocked] Security policy". Update the golden substring to the new wording. (ops_install::non_2xx_install_fetch_returns_err_for_4xx_and_5xx also flaked here — a pre-existing env-var-race under llvm-cov, untouched by this branch.) Claude-Session: https://claude.ai/code/session_014RLnG2QbdL3n9TLtfomdhB
Summary
[policy-blocked]denials emitted deep inSecurityPolicy/ the tools a structured Blocked / Reason / Workaround / relay message, so the agent gets a concrete way forward instead of dead-ending.PolicyDenial::SecurityPolicyBlocked { tool, raw_reason }+ amaybe_enrich_policy_blockhelper, wired intoToolOutcomeCaptureMiddleware::after_tool(the last after_tool hook). The[policy-blocked]marker is preserved; already-structuredToolPolicyMiddlewaredenials are not double-wrapped.config_*_privacy_modemethods to a stale schema-catalog golden — see## Related.Problem
Issue #4094: when a tool policy / permission boundary blocks an action, the agent often dead-ends — no clear message, no workaround — leaving the user stuck.
PR #4443 fixed this for the pluggable-
ToolPolicyand channel-permission denials (rendered viaPolicyDenialinToolPolicyMiddleware). But the security-policy / autonomy layer was untouched: every[policy-blocked]denial produced insideSecurityPolicyand the tools themselves — read-only autonomy (enforcement.rs), command classification (command_checks.rs), path checks (path_checks.rs), and ~14 per-tool guards (filesystem write/edit/apply_patch, git, schedule, install_tool, browser, screenshot, mouse, …) — still returns a bare marker line like"[policy-blocked] Security policy: read-only mode — only read commands are allowed". The model gets no reason expansion, no workaround, and no instruction to relay it to the user, so it halts.Solution
Enrich those raw denials at a single seam rather than the ~20 emission sites:
PolicyDenial::SecurityPolicyBlocked { tool, raw_reason }renders the standardBlocked: … Reason: … Workaround: … <relay>shape. It strips the marker from the reason and re-adds it on theBlocked:line, so the[policy-blocked]marker survives for downstreamcontains(POLICY_BLOCKED_MARKER)checks (failure classification, the repeated-failure loop-breaker). The workaround points at the levers that actually unblock the family — raise the agent-access tier / autonomy (Settings → Agent access/config.update_autonomy_settings/[autonomy]), or use a permitted alternative.maybe_enrich_policy_block(tool, content)returns the enriched string only when the content carries the marker and lacks aWorkaround:suffix — so already-structuredToolPolicyMiddlewaredenials (from fix(agent-harness): structured policy-block messages — rework of #4390 onto post-#4399 #4443) and plain errors are left untouched (no double-wrapping).ToolOutcomeCaptureMiddleware::after_tool, which already inspects the marker for classification and runs last in the after_tool chain, so the enriched text is what the transcript keeps.Submission Checklist
security_policy_block_keeps_marker_and_adds_workaround_and_relay,maybe_enrich_only_touches_raw_marker_results(raw → enriched, structured → skipped, plain error → skipped), and a middleware testraw_security_policy_block_is_enriched_with_workaround_and_relay+already_structured_denial_is_not_double_wrapped.N/A: behaviour-only change(extends existing policy-denial messaging).## Related— N/A (no new feature row).N/A: internal harness messaging.Closes #NNN.Impact
Related
test(ci):commit here addsconfig_get_privacy_mode/config_set_privacy_modeto theworker_a_controller_schemas_are_fully_exposedgolden — an unrelated pre-existing break (feat(privacy): Privacy Mode + local-only inference enforcement (#4435) #4446 registered those RPC methods without updating the golden; also fixed in feat(orchestration): nested contacts → sessions with create + session send #4475). Drop that commit once main carries the fix.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/GH-4094-security-policy-block-messagesValidation Run
pnpm --filter openhuman-app format:check— N/A (no frontend changes)pnpm typecheck— N/A (no TS changes)openhuman::tinyagents::policy_denial::tests+ the twoToolOutcomeCaptureMiddlewareenrichment tests — execution deferred to CI (shared local build host saturated)rustfmt --edition 2021 --checkclean;cargo check --bin openhuman-corepassed (Finished dev)Validation Blocked
command:cargo test --lib -- openhuman::tinyagents::policy_denial openhuman::tinyagents::middleware::testserror:none — shared build host saturated by a concurrent worktree's build; test-binary compile did not finish in-windowimpact:cargo checkcompiled clean; the tests are deterministic assertions over the added code and are validated by CIBehavior Changes
[policy-blocked]tool results now carry a workaround + a relay-to-the-user directive.Parity Contract
[policy-blocked]marker is retained, soToolFailureClass::BlockedByPolicyclassification and theRepeatedToolFailureMiddlewarehard-reject path match exactly as before.ToolPolicyMiddleware(fix(agent-harness): structured policy-block messages — rework of #4390 onto post-#4399 #4443) output is detected via itsWorkaround:suffix and left untouched.Workaround:suffix exists; success results and plain errors are never rewritten.Duplicate / Superseded PR Handling