Budget mode for non-interactive compiles via provider batch/async APIs

> Open scope. I sketched the design but won't be building it — leaving notes here in case someone wants to pick it up.

## The problem

Compile spend dominates LLM cost. The hottest call sites — `extract`, `disambiguate`, `crossref`, `draft_page` — fire once per source or page and account for the bulk of token usage. All three major providers offer a batch/async submission mode at 50% of the standard rate (Gemini Batch API, OpenAI Batch API, Anthropic Message Batches API), with a 24-hour SLA ceiling — most batches return well before that, but it's not guaranteed.

Today every compile call is real-time. There's no way to opt into "I have 80 sources from a research dump, I don't care if it takes a few hours overnight, just halve the bill."

A Budget mode that routes eligible compile steps through whichever provider's batch endpoint exists would close this. The user-facing UX for the eligible paths already supports walking away.

## Why the existing UX fits

The whole onboarding flow is already fire-and-forget. Confirmed by reading the code:

- `POST /api/onboarding/finalize` triggers n8n via webhook then returns 200 immediately ([app/src/app/api/onboarding/finalize/route.ts:115](app/src/app/api/onboarding/finalize/route.ts#L115)). The 10s timeout in [app/src/lib/trigger-n8n.ts:34](app/src/lib/trigger-n8n.ts#L34) covers "did n8n receive the trigger," not compile completion.
- The progress page is a polling client only — 2s polls against `/api/compile/progress` ([app/src/app/onboarding/progress/page.tsx:128-181](app/src/app/onboarding/progress/page.tsx#L128-L181)). State rehydrates from the `compile_progress` row on every poll.
- The Dashboard button is always visible; comment literally reads `Dashboard — always visible so the user can leave` ([app/src/app/onboarding/progress/page.tsx:732](app/src/app/onboarding/progress/page.tsx#L732)).
- The dashboard banner key `kompl_active_compile` ([app/src/app/onboarding/progress/page.tsx:162](app/src/app/onboarding/progress/page.tsx#L162)) persists across navigation so the user sees compile state when they come back.
- After 15 min the page already says "Taking longer than expected — you can close this tab and come back later" ([app/src/app/onboarding/progress/page.tsx:850-852](app/src/app/onboarding/progress/page.tsx#L850-L852)).

A Budget-mode compile that takes hours is a worse-latency variant of an already-walk-away UX. The wait is opt-in.

## Eligible call sites

All n8n-triggered, all already non-interactive:

- Onboarding compile (`/api/onboarding/finalize` → `/webhook/session-compile`) — primary target, biggest spend
- Recompile after source update (`/api/sources/[source_id]/recompile`)
- Retry-failed sweeps (`/api/compile/retry-failed`)
- Lint pass (`lint-wiki.json` — Mon 11:30 + manual `/webhook/lint`)
- Weekly digest generation (`weekly-digest.json` — Mon 11:40)

**Not eligible:** chat answer synthesis (`synthesize_answer`), any user-facing synchronous LLM call. Those stay on the real-time path.

## Provider batch APIs to support

The shape is comparable across providers, but each has its own quirks:

| Provider | Endpoint | Discount | Completion |
|---|---|---|---|
| Gemini | [Batch API](https://ai.google.dev/gemini-api/docs/batch-api) | 50% | Poll only; supports `response_schema` (inherits #7 repetition bug) |
| OpenAI | [Batch API](https://developers.openai.com/api/docs/guides/batch) | 50% | Poll or webhook |
| Anthropic | [Message Batches API](https://platform.claude.com/docs/en/build-with-claude/batch-processing) | 50% | Poll only |

This Budget mode presupposes a provider abstraction. Each provider's adapter optionally implements a `submitBatch` method; providers without batch support fall back to real-time.

## Edge cases

The list I came up with — there are probably more, and some may collapse once the provider abstraction lands.

1. **Per-step eligibility.** `extract` and `disambiguate` are read-only and easy to batch. `draft_page` is per-page and large; batchable in principle but mid-batch failures lose more work. `crossref` depends on draft outputs being committed first — does Budget mode batch each step independently, or pipeline a single mega-batch?
2. **Mid-batch failure handling.** Provider returns 95% successes + 5% failures. Fall back to real-time for the failed slice? Mark them and surface in the existing retry-failed flow? Abort the whole batch?
3. **Cancellation semantics.** The progress page has a Cancel button ([app/src/app/onboarding/progress/page.tsx:715-728](app/src/app/onboarding/progress/page.tsx#L715-L728)). Batch APIs don't all support cancel — Gemini does, OpenAI does, Anthropic only via batch-cancel endpoint. Define cancel as "best-effort upstream cancel + locally discard results" or just "discard results when they arrive"?
4. **Polling cadence.** Real-time compile pings n8n step-by-step. Batch submission polls a job ID. 1 min vs 5 min vs 15 min trades latency for API quota burn. Some providers also have polling rate limits.
5. **Structured-output reliability in batch mode.** Structured-output bugs are not Gemini-specific:
   - Gemini 2.5 / 3.x — repetition loop on long technical inputs (#7)
   - Kimi K2.5 — same repetition-loop family
   - GPT-4.1 — `\n\n\n\n` padding bug via Responses API (different mechanism, same output-token bloat shape)
   - GPT-5.4 nano/mini — schema-drop on nested JSON, intermittent missing required keys
   - Claude / DeepSeek / Mistral — no documented bug as of May 2026

   The salvage-and-retry logic shipped for #7 should generalize into a per-provider robustness layer in the abstraction. Batch mode amplifies the cost of these bugs because there's no streaming option to abort early — a runaway response burns the full budget before the failure is visible.
6. **Cost-guard interaction.** Existing daily budget cap counts pyrate-limiter tokens. Batched calls bypass the limiter entirely. Need a separate accounting path or a batch-cost preflight.
7. **Mixed mode within a session.** If `extract` goes through batch but `draft_page` stays sync, the session sits in "queued for batch" → "running real-time" → "queued for batch" — `compile_progress` step state semantics need to support this without breaking the progress UI.
8. **UI affordance for Budget mode.** Settings toggle (sticky, applies to all future compiles)? Per-compile checkbox on the onboarding review page? Auto-suggest at >N sources? "Use Budget mode for this compile" on the dashboard banner?

## Design decisions any implementation has to commit to

- **Trigger.** Settings toggle vs per-compile flag vs auto-threshold (e.g. "≥30 sources auto-suggests Budget mode")?
- **Granularity.** Per-step batching, per-session batching, or pipeline-as-one-batch?
- **Failure fallback.** Batch errors → real-time fallback for failed slice, manual retry-failed, or abort?
- **Polling.** Cadence and where it lives — n8n schedule, Next.js setInterval, or per-session worker?
- **Cancel.** Best-effort upstream + local discard, or local-only?
- **Cost telemetry.** Add a separate `batch_usage` activity log or extend the existing per-call logging?
- **Provider parity.** Require all providers to implement `submitBatch`, or let providers opt in (and the UI hides Budget mode for providers that don't)?

## Related

- Closes/replaces #9 — Gemini Batch API enhancement, cancelled because of multi-provider direction.
- #7 deferred-followups list ("Batch API migration") — this issue is the multi-provider generalization of that bullet.
- Provider abstraction (presupposed, separate enhancement).

## Out of scope

- Real-time / chat / synchronous LLM paths.
- Cross-provider unified pricing math (let each adapter report its own).
- Auto-routing between providers based on cost/availability.
- Mid-batch streaming progress (batch is by definition not streaming).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Budget mode for non-interactive compiles via provider batch/async APIs #55

The problem

Why the existing UX fits

Eligible call sites

Provider batch APIs to support

Edge cases

Design decisions any implementation has to commit to

Related

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Provider	Endpoint	Discount	Completion
Gemini	Batch API	50%	Poll only; supports `response_schema` (inherits #7 repetition bug)
OpenAI	Batch API	50%	Poll or webhook
Anthropic	Message Batches API	50%	Poll only

Budget mode for non-interactive compiles via provider batch/async APIs #55

Description

The problem

Why the existing UX fits

Eligible call sites

Provider batch APIs to support

Edge cases

Design decisions any implementation has to commit to

Related

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions