Skip to content

Reagan/hitl#443

Open
Cheggin wants to merge 17 commits into
mainfrom
reagan/hitl
Open

Reagan/hitl#443
Cheggin wants to merge 17 commits into
mainfrom
reagan/hitl

Conversation

@Cheggin
Copy link
Copy Markdown
Collaborator

@Cheggin Cheggin commented May 27, 2026

Summary by cubic

Adds three human-in-the-loop blocks: login for credential entry, capture for reCAPTCHA tile selection, and iframe for embedding a live URL inline. This lets agents pause, collect input safely in chat, or let users complete a remote step, then resume browser automation.

  • New Features

    • Renderer: added LoginForm, CaptureBlock, and IframeBlock with transcript-based hydration.
    • Parser: htmlBlocks now recognizes login, capture, and iframe fences with strict JSON validation (LoginPayload, CapturePayload, IframePayload + parsers).
    • Engine prompts: added loginBlockGuidanceLines() and wired into browsercode, claude-code, and codex.
    • Chat wiring: ChatTurn dispatches all new blocks; ChatPane listens for chatv2:open-browser; CSP updated to allow frame-src 'self' https:.
    • Docs: added capture-block.md (per-tile-union recipe, cross-origin bframe via session.use(), verification) and iframe-block.md (when to use, limits).
    • Tests: unit tests for capture parsing/UI and iframe streaming/parser (CaptureBlock.spec.tsx, captureBlocks.test.ts, iframeBlocks.test.ts).
  • UI Improvements

    • CaptureBlock: reCAPTCHA-style banner (lead/bold/trailer, optional targetImage), NxM CSS tiling, clear submitted state; guarded submit when the sessions IPC bridge is missing.
    • IframeBlock: sandboxed frame with “I’m done”/“Skip” foot actions; read-only (no DOM access).
    • OptionList: restored outer card chrome and added streaming skeleton cards.
    • Scoped styles (captureBlock.css, loginForm.css, iframeBlock.css) and minor polish across blocks.

Written for commit d1fabba. Summary will update on new commits.

Review in cubic

Cheggin added 11 commits May 26, 2026 17:34
The outer bordered container around the picker was removed in 9bdcd99;
bring it back so the option grid reads as a card on the chat surface
again. Also render two skeleton cards at the tail while the fence is
still streaming, mirroring AskForm's TRAILING_SKELETONS pattern, so the
UI doesn't appear frozen between option arrivals.
Match Google's actual challenge UI: small lead-in ("Select all images
with"), big bold subject ("a bus"), small trailer ("Click verify
once there are none left."). If the agent emits only `prompt` as a
full sentence, splitCapturePrompt() recovers the three parts via the
standard "... with X. ..." shape; an explicit `target` short-circuits
the split.

Also:
- accept an optional `targetImage` URL in the payload (renders as the
  reCAPTCHA example thumbnail on the right of the banner).
- hug the grid width (`width: fit-content; max-width: min(360px, 100%)`)
  so the container no longer stretches across the chat surface.
- bump the outer border to `--color-border-strong` so the card is
  visible against the chat background in light mode.
- flip the tile gutter color to `--color-bg-base` so the grout reads
  white in light mode / near-black in dark, matching Google.
Agents kept clipping with `.rc-imageselect-table-33`'s bounding rect,
which is up to 100px taller than the visible tile area (the floating
Verify toolbar lives absolutely-positioned inside that table). The
resulting screenshots had a misaligned bottom row and an unwanted
toolbar strip.

The reliable recipe is to take the union of `td.rc-imageselect-tile`
rects inside the bframe (same-origin with the parent) and pass that
to `Page.captureScreenshot`'s `clip` in CSS pixels. Document the
verification checklist (near-square ratio, no toolbar visible, 9 cells
edge-to-edge) and the common ways agents have gotten this wrong, plus
the bframe-attach fallback for the rare non-standard layout.
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented May 27, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
33203192 Triggered Username Password 4dbb658 app/src/renderer/hub/chat-v2/htmlBlocks.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 16 files

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread app/src/main/hl/stock/interaction-skills/capture-block.md Outdated
Comment thread app/src/main/hl/stock/interaction-skills/capture-block.md Outdated
Comment thread app/src/main/hl/stock/interaction-skills/capture-block.md Outdated
Comment thread app/src/renderer/hub/chat-v2/CaptureBlock.tsx Outdated
Cheggin added 2 commits May 27, 2026 00:17
The bframe is cross-origin on any third-party host that embeds reCAPTCHA
(parent is e.g. `2captcha.com` while the bframe is `google.com`), so
the previous `bf.contentDocument` recipe would throw a SecurityError on
those sites — only Google's own /recaptcha/api2/demo page happens to be
same-origin with the bframe.

Rewrite the canonical recipe to attach to the bframe target via
`Target.attachToTarget`, run the tile-union + prompt-text queries
inside that session, then translate inner-iframe coords back to page
coords using the outer <iframe> rect read from the parent page. Works
for both same-origin and cross-origin deployments.

Also drop two stale fallback sections that referred to constants
(32/120/336) and variables (`bf`) that never existed in this document
— leftovers from an earlier draft.
The previous `window.electronAPI?.sessions?.resume(...)` optional-chain
silently resolved to `undefined` when the preload bridge wasn't wired
in (in unit tests, in window types that drift, or if the IPC handler
was removed). `result?.error` was then `undefined`, so the submit
path took the success branch — recording a cache entry and locking the
picker into the "Sent to agent" state even though no message ever
left the renderer.

Resolve `resume` first; if it's missing, surface a 'sessions bridge
unavailable' error and roll the local state back so the user can retry.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread app/src/main/hl/stock/interaction-skills/capture-block.md Outdated
Cheggin added 4 commits May 27, 2026 00:32
The previous recipe called a `cdp('Runtime.evaluate', ..., sessionId)`
helper that doesn't exist in browser-harness-js — only the Python
harness exposes `cdp(...)`. The JS runner's documented pattern for
routing Runtime/DOM calls into an OOPIF is `session.use(targetId)`
followed by ordinary `session.Runtime.evaluate(...)`, as shown in
cross-origin-iframes.md.

Rewrite the recipe around `session.use()`: resolve the outer iframe
rect from the parent target, route into the bframe, run the tile-union
+ challenge-text query there, then route back to the parent target
(in a `finally` so a thrown error doesn't strand subsequent
`Page.captureScreenshot` calls on the wrong target).
Lets the agent emit an iframe fence carrying { url, prompt?, width?,
height?, submitLabel? } to embed an arbitrary https URL inline in a
chat turn. The renderer mounts a sandboxed iframe and surfaces a foot
button the user clicks once they are done inside the frame.

Tiny flat payload, no streaming-partial path -- the picker cannot
render meaningfully until the URL is in hand. URL must be absolute
http(s); non-http(s) schemes (javascript:, data:, etc.) are rejected
at parse time. width and height are clamped to safe ranges.
Sandboxed iframe (allow-scripts allow-same-origin allow-forms
allow-popups) inside a chat-turn card matching the OptionList /
CaptureBlock shell. Foot button offers "I'm done" / "Skip" --
submission resumes the agent with "Iframe interaction complete." or
"Iframe skipped." as the next user turn.

The component cannot read the iframe's DOM (sandbox + cross-origin),
so the agent only learns that the user finished, not what they did.
That is a feature: the iframe runs in the renderer's session,
decoupled from whatever the agent is doing in its CDP-driven Chromium
instance.

Also widen hub.html's CSP from default-src 'self' to allow
frame-src 'self' https: -- without this the new iframe is blocked.

Unit tests cover the streaming + parser contract (7 cases) and the
hasStructuredBlock dispatch in ChatTurn is extended to route the new
iframe_block event variant.
Documents when the iframe fence is the right tool (public forms,
OAuth screens, demos) and -- more importantly -- when it is not.

The reCAPTCHA bframe section is based on a live experiment: the
bframe URL does load inside our renderer (Google sets no
X-Frame-Options and no frame-ancestors restriction), but it renders
completely blank because the challenge UI is populated by
postMessage from the matching anchor iframe on the original host.
Loading the URL standalone gets you Google's bootstrap script plus
an empty <div></div>.

For captcha challenges, the capture fence remains the right tool --
screenshot + proxied clicks keep the live agent session intact.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="app/src/renderer/hub/chat-v2/IframeBlock.tsx">

<violation number="1" location="app/src/renderer/hub/chat-v2/IframeBlock.tsx:57">
P2: Cache key is too coarse (URL-only), so repeated iframe blocks with the same URL can be falsely restored as already submitted.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

function IframeReady({ payload, sessionId, streaming, nextUserText }: ReadyProps): React.ReactElement {
const { url, prompt, width, height, submitLabel } = payload;
const cacheKey = useMemo(
() => `iframe:${submissionKey(sessionId, [url])}`,
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Cache key is too coarse (URL-only), so repeated iframe blocks with the same URL can be falsely restored as already submitted.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/src/renderer/hub/chat-v2/IframeBlock.tsx, line 57:

<comment>Cache key is too coarse (URL-only), so repeated iframe blocks with the same URL can be falsely restored as already submitted.</comment>

<file context>
@@ -0,0 +1,209 @@
+function IframeReady({ payload, sessionId, streaming, nextUserText }: ReadyProps): React.ReactElement {
+  const { url, prompt, width, height, submitLabel } = payload;
+  const cacheKey = useMemo(
+    () => `iframe:${submissionKey(sessionId, [url])}`,
+    [sessionId, url],
+  );
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant