Skip to content

Releases: maxmilian/loop-engineering

v0.1.1 — proxy-gaming coverage + weak-model validation

24 Jun 03:38

Choose a tag to compare

Maintenance + validation release. Drop-in compatible with v0.1.0.

Highlights

🛡️ New coverage: agents gaming their own success proxy

Principle 3 and the review checklist now explicitly cover reward hacking / Goodhart's law — an agent making its success signal go green without doing the work (deleting/weakening tests, editing the validator or CI config, continue-on-error, hardcoding output). Guidance added: keep the verifier read-only to the agent, reject changes that touch or weaken the checker, and ask "how would a lazy agent make this green without doing the work?"

This gap was surfaced by the eval suite (both configs missed it on a weak model) and is regression-verified: on Haiku the proxy-gaming point went from 0/2 → 6/6 runs after the fix.

📊 Weak-model validation (the real signal)

On a frontier model the benchmark is at the ceiling (the model already knows this stuff). On a weaker model (Haiku class) the skill delivers a measured +16 points74% → 90% pass rate on an 8-case subset — concentrated on the verify / human-gate / stopping-condition family. Full methodology and honest results in evals/RESULTS.md (6 iterations).

📚 Expanded case library

evals/evals.json grown to 16 graded cases spanning all four loop patterns (heartbeat / cron / hook / goal) + long-horizon context, design / review / diagnose, including an adversarial-framing subset.

📦 Cleaner bundle

The .skill bundle is now built from a clean export — no .git internals — and ships all five README languages (en / 繁中 / 简中 / 日本語 / 한국어), references, assets, and LICENSE.

Install

Download loop-engineering.skill below, or:

git clone https://github.com/maxmilian/loop-engineering ~/.claude/skills/loop-engineering

Works with Claude Code, Codex, Copilot CLI, Gemini CLI (see README).

v0.1.0 — Loop Engineering

24 Jun 02:45

Choose a tag to compare

First tagged release of Loop Engineering — a portable skill that gives a coding agent a battle-tested framework for designing & reviewing autonomous / semi-autonomous agent loops.

What it does

  • Design mode — build a new self-running agent / loop / background worker.
  • Review mode — audit an existing loop's stopping conditions, guardrails, verification, and escalation paths.

Distilled from 12 sources (Anthropic's context-engineering guidance, the Ralph loop / RPI methodology, Claude Code's agent-loop docs, and the 2026 "loop engineering" writing) into seven load-bearing principles plus reference material.

Benchmarked against a no-skill baseline on deliberately tricky cases: pass rate 87% → 100%, with lower variance. Eval set + per-iteration results are in evals/.

Install

Claude Code (clone):
```bash
git clone https://github.com/maxmilian/loop-engineering ~/.claude/skills/loop-engineering
```

Prebuilt bundle: download `loop-engineering.skill` below and install via your plugin / skill manager. Works across Claude Code / Codex / Copilot / Gemini — a skill is just a folder with a `SKILL.md`. See the README for per-tool paths.

MIT licensed.