Skip to content

fix(agent-harness): give raw security-policy blocks a workaround + relay directive (#4094)#4478

Merged
senamakel merged 2 commits into
tinyhumansai:mainfrom
senamakel:fix/GH-4094-security-policy-block-messages
Jul 4, 2026
Merged

fix(agent-harness): give raw security-policy blocks a workaround + relay directive (#4094)#4478
senamakel merged 2 commits into
tinyhumansai:mainfrom
senamakel:fix/GH-4094-security-policy-block-messages

Conversation

@senamakel

Copy link
Copy Markdown
Member

Summary

  • Give the ~20 raw [policy-blocked] denials emitted deep in SecurityPolicy / the tools a structured Blocked / Reason / Workaround / relay message, so the agent gets a concrete way forward instead of dead-ending.
  • Add PolicyDenial::SecurityPolicyBlocked { tool, raw_reason } + a maybe_enrich_policy_block helper, wired into ToolOutcomeCaptureMiddleware::after_tool (the last after_tool hook). The [policy-blocked] marker is preserved; already-structured ToolPolicyMiddleware denials are not double-wrapped.
  • (CI) One unrelated commit adds the two config_*_privacy_mode methods to a stale schema-catalog golden — see ## Related.

Problem

Issue #4094: when a tool policy / permission boundary blocks an action, the agent often dead-ends — no clear message, no workaround — leaving the user stuck.

PR #4443 fixed this for the pluggable-ToolPolicy and channel-permission denials (rendered via PolicyDenial in ToolPolicyMiddleware). But the security-policy / autonomy layer was untouched: every [policy-blocked] denial produced inside SecurityPolicy and the tools themselves — read-only autonomy (enforcement.rs), command classification (command_checks.rs), path checks (path_checks.rs), and ~14 per-tool guards (filesystem write/edit/apply_patch, git, schedule, install_tool, browser, screenshot, mouse, …) — still returns a bare marker line like "[policy-blocked] Security policy: read-only mode — only read commands are allowed". The model gets no reason expansion, no workaround, and no instruction to relay it to the user, so it halts.

Solution

Enrich those raw denials at a single seam rather than the ~20 emission sites:

  • PolicyDenial::SecurityPolicyBlocked { tool, raw_reason } renders the standard Blocked: … Reason: … Workaround: … <relay> shape. It strips the marker from the reason and re-adds it on the Blocked: line, so the [policy-blocked] marker survives for downstream contains(POLICY_BLOCKED_MARKER) checks (failure classification, the repeated-failure loop-breaker). The workaround points at the levers that actually unblock the family — raise the agent-access tier / autonomy (Settings → Agent access / config.update_autonomy_settings / [autonomy]), or use a permitted alternative.
  • maybe_enrich_policy_block(tool, content) returns the enriched string only when the content carries the marker and lacks a Workaround: suffix — so already-structured ToolPolicyMiddleware denials (from fix(agent-harness): structured policy-block messages — rework of #4390 onto post-#4399 #4443) and plain errors are left untouched (no double-wrapping).
  • Call it at the top of ToolOutcomeCaptureMiddleware::after_tool, which already inspects the marker for classification and runs last in the after_tool chain, so the enriched text is what the transcript keeps.

Submission Checklist

  • Tests added or updated (happy path + failure/edge) — security_policy_block_keeps_marker_and_adds_workaround_and_relay, maybe_enrich_only_touches_raw_marker_results (raw → enriched, structured → skipped, plain error → skipped), and a middleware test raw_security_policy_block_is_enriched_with_workaround_and_relay + already_structured_denial_is_not_double_wrapped.
  • Diff coverage ≥ 80% — the new render arm, helper, and middleware branch are all exercised by the added tests.
  • Coverage matrix updated — N/A: behaviour-only change (extends existing policy-denial messaging).
  • All affected feature IDs listed under ## Related — N/A (no new feature row).
  • No new external network dependencies introduced.
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: internal harness messaging.
  • Linked issue closed via Closes #NNN.

Impact

  • Desktop/CLI agent harness. No schema/migration/network change. Purely enriches the text of already-failing policy-blocked tool results; the marker and failure classification are preserved, so the loop-breaker and timeline behave identically.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: fix/GH-4094-security-policy-block-messages
  • Commit SHA: see head

Validation Run

  • pnpm --filter openhuman-app format:check — N/A (no frontend changes)
  • pnpm typecheck — N/A (no TS changes)
  • Focused tests: openhuman::tinyagents::policy_denial::tests + the two ToolOutcomeCaptureMiddleware enrichment tests — execution deferred to CI (shared local build host saturated)
  • Rust fmt/check (if changed): rustfmt --edition 2021 --check clean; cargo check --bin openhuman-core passed (Finished dev)
  • Tauri fmt/check (if changed): N/A

Validation Blocked

  • command: cargo test --lib -- openhuman::tinyagents::policy_denial openhuman::tinyagents::middleware::tests
  • error: none — shared build host saturated by a concurrent worktree's build; test-binary compile did not finish in-window
  • impact: cargo check compiled clean; the tests are deterministic assertions over the added code and are validated by CI

Behavior Changes

  • Intended behavior change: raw security-policy/autonomy [policy-blocked] tool results now carry a workaround + a relay-to-the-user directive.
  • User-visible effect: on a policy block, the agent explains what was blocked and why and offers a concrete next step, instead of stopping silently.

Parity Contract

  • Legacy behavior preserved: the [policy-blocked] marker is retained, so ToolFailureClass::BlockedByPolicy classification and the RepeatedToolFailureMiddleware hard-reject path match exactly as before. ToolPolicyMiddleware (fix(agent-harness): structured policy-block messages — rework of #4390 onto post-#4399 #4443) output is detected via its Workaround: suffix and left untouched.
  • Guard/fallback/dispatch parity checks: enrichment only fires when the marker is present and no Workaround: suffix exists; success results and plain errors are never rewritten.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: this one

senamakel added 2 commits July 3, 2026 22:55
…lay directive (tinyhumansai#4094)

The pluggable-ToolPolicy and channel-permission denials were given a
structured "Blocked / Reason / Workaround / relay" message by tinyhumansai#4443, but
the ~20 `[policy-blocked]` denials emitted deep in `SecurityPolicy` and
the tools themselves (read-only autonomy, command classification, path
checks, per-tool guards) still returned a bare marker line with no
workaround and no relay instruction — so the agent dead-ends on them with
a diagnosis and no path forward.

Enrich those raw denials at a single seam:
- Add `PolicyDenial::SecurityPolicyBlocked { tool, raw_reason }` and a
  `maybe_enrich_policy_block` helper that wraps a marker-bearing result
  into the same structured shape (keeps the reason, appends a concrete
  workaround — raise the agent-access tier / use a permitted alternative —
  and the relay-to-the-user directive). The `[policy-blocked]` marker is
  preserved so failure classification and the repeated-failure breaker
  still match.
- Call it at the top of `ToolOutcomeCaptureMiddleware::after_tool` (the
  last after_tool hook, so the enriched text is what the transcript keeps).
  Already-structured ToolPolicyMiddleware denials (which carry a
  `Workaround:` suffix) are left untouched — no double-wrapping.

Tests: the new variant keeps the marker + adds workaround/relay; the
helper enriches raw blocks, skips already-structured denials and plain
errors; a middleware test drives a raw read-only-mode block through
`after_tool` and asserts the enrichment.

Claude-Session: https://claude.ai/code/session_01KcmdqJVpjmnH31HqTHRLwG
Unrelated CI unblock: tinyhumansai#4446 (Privacy Mode) registered
`config_get_privacy_mode` / `config_set_privacy_mode` unconditionally but
did not update the `worker_a_controller_schemas_are_fully_exposed` golden,
so that test fails on main for every PR without the fix. Add the two
methods at their sorted positions (same fix as tinyhumansai#4475) so this PR's Rust
E2E / coverage jobs go green.

Claude-Session: https://claude.ai/code/session_01KcmdqJVpjmnH31HqTHRLwG
@senamakel senamakel requested a review from a team July 3, 2026 22:57
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 40 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cdc7b49c-593a-4fad-9e39-64d7f5c47c9a

📥 Commits

Reviewing files that changed from the base of the PR and between 4d71702 and 6574b03.

📒 Files selected for processing (3)
  • src/openhuman/tinyagents/middleware.rs
  • src/openhuman/tinyagents/policy_denial.rs
  • tests/config_auth_app_state_connectivity_e2e.rs

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6574b034be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1415 to +1417
if let Some(enriched) =
super::policy_denial::maybe_enrich_policy_block(&result.name, &result.content)
{

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate policy-block enrichment to failed tool results

When a successful tool returns content containing the literal [policy-blocked] marker, this unconditional enrichment rewrites the success payload into a fake security-policy denial before success is computed. This is reachable with file_read, which returns file contents verbatim on success, so reading source/docs that mention the marker corrupts the transcript while the call is still recorded as successful; only actual failed, marker-prefixed policy errors should be enriched.

Useful? React with 👍 / 👎.

@senamakel senamakel merged commit 83c6e01 into tinyhumansai:main Jul 4, 2026
13 of 15 checks passed
senamakel added a commit to senamakel/openhuman that referenced this pull request Jul 4, 2026
…rge)

Two tests came in red from upstream/main via the merge (upstream main is itself
red on these — its own PR CI fails on Rust Core Coverage); our changes don't
touch the code they exercise:

- tinyagents adapter_inventory_registers_model_tools_and_middleware: recent
  upstream middleware-stack churn produces 13 lifecycle middleware, but the
  literal still said 14 (its own enumeration lists 13). Match the code.
- monitor_agent_e2e orchestrator_gets_denial_when_monitor_command_violates_policy:
  tinyhumansai#4478 reworded the security-policy denial to
  "[policy-blocked] Tool '<tool>' was blocked by the security policy"; the test
  still asserted the old contiguous "[policy-blocked] Security policy". Update
  the golden substring to the new wording.

(ops_install::non_2xx_install_fetch_returns_err_for_4xx_and_5xx also flaked here
— a pre-existing env-var-race under llvm-cov, untouched by this branch.)

Claude-Session: https://claude.ai/code/session_014RLnG2QbdL3n9TLtfomdhB
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Harness] Policy-blocked actions should return a clear message + workaround

1 participant