Skip to content

fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410

Open
oxoxDev wants to merge 3 commits into
tinyhumansai:mainfrom
oxoxDev:fix/4404-subconscious-rate-cap-halt
Open

fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410
oxoxDev wants to merge 3 commits into
tinyhumansai:mainfrom
oxoxDev:fix/4404-subconscious-rate-cap-halt

Conversation

@oxoxDev

@oxoxDev oxoxDev commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Stop the subconscious background agent from re-firing — and re-reporting — a permanently-doomed provider request every tick when the configured model rejects it with a per-minute token cap (413/TPM). Sentry TAURI-RUST-HXF: 2232 events from one user.
  • Add a circuit breaker that halts subconscious ticks while the offending Subconscious provider config is still set, and auto-resumes the moment the user switches model/provider/tier.
  • Demote a direct BYO-provider 413/TPM rejection from an unexpected Sentry crash to expected user-config state (the account's rate tier is not a lever OpenHuman controls). Managed-backend PAYLOAD_TOO_LARGE guard-leaks still page, unchanged.

Problem

A user pointed the Subconscious agent at a groq on_demand free-tier model (openai/gpt-oss-120b) whose cap is 8000 tokens/minute. A subconscious turn builds ~42k tokens of context — 5× over the per-minute rate cap (not the context window, so trimming can't help), so groq rejects every call:

groq API error (413 Payload Too Large): Request too large for model `openai/gpt-oss-120b` …
service tier `on_demand` on tokens per minute (TPM): Limit 8000, Requested 42084

Two defects follow:

  1. Per-tick re-report flood — the tick loop re-fires the identical, permanently-doomed request every 5–30 min and the provider_chat boundary re-reports it each time (the cron-billing-flood family, fix(cron): stop cron billing-state Sentry floods — 402 credits + 400 budget (TAURI-RUST-514 / -BMW) #3913), while also burning the user's provider quota.
  2. Mis-classified as a crash — a raw direct-provider 413/TPM matched no user-state/transient classifier, so it paged as an unexpected error.

Solution

  • New shared matcher is_provider_rate_cap_exceeded_message (inference/provider/ops/http_error.rs): recognizes a permanent per-request rate-cap 413, anchored on both "request too large" (single-request permanence) and a tokens-per-minute marker — so a transient 429 burst and context-window overflow stay in their own buckets. Single source of truth for the two consumers below (no wording drift).
  • Sentry demotion (core::observability::is_provider_user_state_message): the direct-provider TPM rejection demotes to ProviderUserState. Ordered after the managed-backend guard-leak arm, so managed PAYLOAD_TOO_LARGE still force-captures.
  • Circuit breaker (subconscious::engine): on a permanent rate-cap agent error, arm a halt keyed on the Subconscious provider signature; subsequent ticks skip the agent run entirely until the signature changes (user picks a new model/tier), then auto-resume. In-memory only — a restart re-probes once, then re-halts (one event/launch, not a flood). Mirrors the existing tool-capability (TAURI-RUST-ADC) permanent-failure arm.

State transitions and the matcher are extracted into pure/unit-tested helpers; only trivial glue remains in the async tick path.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — verified locally via cargo llvm-cov + diff-cover --compare-branch=upstream/main: observability 100%, http_error 100%, engine 66.7% (live-agent tick glue only), total 82%.
  • N/A: behaviour-only change — no feature rows added/removed/renamed in docs/TEST-COVERAGE-MATRIX.md
  • N/A: behaviour-only change — no matrix feature IDs apply
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: no release-cut surface touched (background subconscious loop + Sentry classifier only)
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Platform: desktop (all OSes) — subconscious runs in the in-process core.
  • Reliability: eliminates a per-tick Sentry flood and stops burning a user's provider quota on a request that can never succeed on their tier.
  • Observability: the underlying condition is now expected user-config state, not a page; the user sees an actionable "pick a higher-tier model" reason in Subconscious status. No masking of real defects — the managed-backend guard-leak still pages, and a transient 429 stays retryable + Sentry-visible.
  • Security/migration: none. In-memory breaker state only; no schema/config change.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

  • Branch: fix/4404-subconscious-rate-cap-halt
  • Commit SHA: a35309d

Validation Run

  • N/A: no frontend changes (pnpm --filter openhuman-app format:check)
  • N/A: no frontend/TypeScript changes (pnpm typecheck)
  • Focused tests: cargo test --lib for inference::provider::ops::http_error, core::observability::tests, subconscious::engine — all green (new: rate-cap matcher, demote + managed-still-pages regression, breaker state transitions)
  • Rust fmt/check: cargo fmt --check clean; cargo clippy --lib no new warnings on touched files
  • N/A: no app/src-tauri changes (Tauri fmt/check)

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: a permanent per-minute token-cap (413/TPM) rejection from a direct BYO Subconscious provider no longer pages Sentry and no longer re-fires every tick; ticks halt until the provider config changes.
  • User-visible effect: Subconscious pauses with an actionable status message ("pick a higher-tier model or provider") instead of silently failing every few minutes; no functional change for correctly-provisioned providers.

oxoxDev added 3 commits July 2, 2026 16:16
…humansai#4404)

Recognize a direct BYO-provider 413 whose single-request token count exceeds
the account's tokens-per-minute cap (groq on_demand free tier). Anchored on
both "request too large" (single-request permanence) and a tokens-per-minute
marker, so a transient 429 burst and context-window overflow stay in their own
buckets. Single source of truth for the Sentry classifier and the subconscious
circuit breaker. Verbatim-body test guards against wording drift.
…nsai#4404)

TAURI-RUST-HXF: a direct BYO provider (groq on_demand free tier) rejecting a
single request that exceeds the account per-minute token cap is user-config
state OpenHuman cannot lift, not a product bug. Add it to is_provider_user_state_message
so the domain=agent re-report demotes instead of paging. The managed-backend
PAYLOAD_TOO_LARGE guard-leak still force-captures earlier, so this arm only
sees direct-provider TPM rejections. Regression test pins the managed path still
pages and a transient/bare 413 is not demoted.
…yhumansai#4404)

TAURI-RUST-HXF: when a tick's provider config keeps rejecting with a permanent
per-minute token cap (413/TPM), the loop re-fired the doomed request every
5-30 min and re-reported it — 2232 events from one user, the cron-billing-flood
family (tinyhumansai#3913). Add a circuit breaker keyed on the Subconscious provider
signature: on a permanent rate-cap agent error, halt the agent run; skip
subsequent ticks while the same config is set; auto-clear the moment the user
switches model/provider/tier. Mirrors the existing tool-capability
(TAURI-RUST-ADC) permanent-failure arm. In-memory only — a restart re-probes
once, then re-halts. Pure helpers unit-tested.
@oxoxDev oxoxDev requested a review from a team July 2, 2026 11:13
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cea2df89-362c-4222-8e1d-179375bdec49

📥 Commits

Reviewing files that changed from the base of the PR and between f979bfa and a35309d.

📒 Files selected for processing (5)
  • src/core/observability.rs
  • src/openhuman/inference/provider/ops/http_error.rs
  • src/openhuman/inference/provider/ops/mod.rs
  • src/openhuman/subconscious/engine.rs
  • src/openhuman/subconscious/engine_tests.rs
👮 Files not reviewed due to content moderation or server errors (5)
  • src/core/observability.rs
  • src/openhuman/inference/provider/ops/mod.rs
  • src/openhuman/subconscious/engine_tests.rs
  • src/openhuman/inference/provider/ops/http_error.rs
  • src/openhuman/subconscious/engine.rs

📝 Walkthrough

[!WARNING]

Walkthrough skipped

File diffs could not be summarized.


Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a35309d2af

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +700 to +704
pub fn is_provider_rate_cap_exceeded_message(body: &str) -> bool {
let lower = body.to_ascii_lowercase();
lower.contains("request too large")
&& (lower.contains("tokens per minute") || lower.contains("(tpm)"))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire the rate-cap matcher into api_error

For the affected direct-compatible provider paths I checked (compatible_provider_impl.rs calls api_error on non-2xx responses), a 413/TPM body reaches api_error; this new matcher is only used by expected_error_kind and the subconscious breaker, so api_error still falls through to should_report_provider_http_failure(status) and emits a domain=llm_provider Sentry event for status 413. That leaves the first provider-origin event/page in place for exactly the Groq scenario this PR is trying to demote; add an is_provider_rate_cap_exceeded_message(&body) branch (and/or a before-send net) before the status gate.

Useful? React with 👍 / 👎.

match resolve_subconscious_route(config) {
SubconsciousProviderRoute::LocalOllama { model } => format!("local:{model}"),
SubconsciousProviderRoute::OpenHumanCloud => "cloud".to_string(),
SubconsciousProviderRoute::Other(raw) => format!("other:{raw}"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include credential changes in the halt signature

This signature keys only on the raw workload route. When a user fixes the TPM cap by pasting a higher-tier API key under the same slug (setCloudProviderKey stores provider:<slug> separately from Config) or by editing the provider row without changing subconscious_provider, the signature stays other:<same route>, so should_skip_for_rate_cap_halt keeps skipping and never re-probes until an app restart or a fake model/provider change. Include relevant provider-entry/credential versioning in the halt key, or clear the halt when AI provider credentials/settings are saved.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(subconscious): halt + demote on permanent provider rate-cap 413 (groq TPM flood)

1 participant