docs: cross-model adversarial review trial (Antigravity/Gemini vs Claude)#16
Conversation
Set up a manual, global process for using a second, different model (Google Antigravity / Gemini 3) to adversarially review work Claude produced — code reviews and design decisions. Complements the same-model Claude PR reviewer (ADR 0021): a different model has different blind spots, so it catches a class of defect the author's own model shares. Deliberately a TRIAL, not a ratified practice — absent from the numbered practices and the decision guide, no merge gate, no adoption obligation. Ships the adversary prompt (skeptical persona, refute-don't-bless, with code-review and design-decision modes) and a trial-log template so the graduate-or-kill decision rests on evidence. Documents the automation rollout path (run-gemini-cli / agy headless, mirroring ADR 0021) as an explicit ADR-gated next step, not part of the trial. Per the methodology's keep-artifacts-honest rule, the README layout table and CHANGELOG [Unreleased] record the addition, marked trial / no-adoption. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Claude finished @vpciii's task in 1m 48s —— View job Cross-model adversarial review trial — review
This is a clean, well-disciplined docs PR. It reads like it was written against this repo's own rules rather than generic best practice, and it holds up. No blocking issues. What's right (against this project's standards):
Two low-severity notes (neither blocks):
One question worth the author answering (not a blocker): introducing a new top-level |
Express the practice in terms of roles (author model / adversary model) rather than specific vendors. The adversary prompt and guide no longer name Claude or Gemini in their body; a single "Current roster" table is the one place that binds the roles to today's models. Swapping or upgrading a model is now a one-table edit — the prompt, the run steps, and the role language stay untouched. This keeps the global methodology model-agnostic: new or updated models drop in without rewrites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Make the graduation section explicit: the existing same-model reviewer (ADR 0021) is vendor-bound in CI (the prompt is already model-neutral, the binding is not), and the graduating ADR will generalize BOTH the author-model and adversary-model reviewers together on one pattern — a shared model-neutral review policy plus thin, swappable per-provider bindings. Deliberately deferred so the live, cross-repo CI path isn't refactored twice. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Roles swapped for the first time: Gemini authored opn-mcp PR #16 (the timezone fix), Claude adversary-reviewed it cold and un-steered. Verdict NO STRONG OBJECTION — the fix was correct, complete, and single-source- respecting, so the adversary raised no false positives. Partial validation: shows the practice doesn't false-positive in reverse, but doesn't yet demonstrate reverse-direction defect-catching (the change had no catchable defect). Still untested: a real flaw in a Gemini-authored change, and a third model. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Sets up a manual, global process for using a second, different model — Google Antigravity (Gemini 3) — to adversarially review work Claude produced, for both code reviews and design decisions. It complements the same-model Claude PR reviewer (ADR 0021): a different model has different blind spots, so it catches a class of defect the author's own model structurally shares.
Per your call: trial-first (no ADR yet) and portable prompt + practice only (no Antigravity config wiring).
What's here
experiments/adversarial-review/adversary-prompt.md— the skeptic persona you paste into Antigravity/Gemini. Refute-don't-bless framing, grounded in the project's own artifacts, with separate Code review and Design decision blocks and a BLOCK / PUSH BACK / NO STRONG OBJECTION verdict.experiments/adversarial-review/README.md— the trial guide: how to run it manually (Antigravity IDE oragyCLI), a trial-log template so graduate-or-kill rests on evidence (signal vs. same-model+CI / false-positive rate / cost), and the graduation trigger → ADR with the automation rollout path (run-gemini-cli/agyheadless) as an explicit ADR-gated next step.Deliberately a trial, not a ratified practice
experiments/dir, notmethodology.md's numbered practices or the decision guide.Repo map kept honest: README layout row + CHANGELOG
[Unreleased], both marked trial / no-adoption.🤖 Generated with Claude Code