Releases: maxmilian/loop-engineering
v0.1.1 — proxy-gaming coverage + weak-model validation
Maintenance + validation release. Drop-in compatible with v0.1.0.
Highlights
🛡️ New coverage: agents gaming their own success proxy
Principle 3 and the review checklist now explicitly cover reward hacking / Goodhart's law — an agent making its success signal go green without doing the work (deleting/weakening tests, editing the validator or CI config, continue-on-error, hardcoding output). Guidance added: keep the verifier read-only to the agent, reject changes that touch or weaken the checker, and ask "how would a lazy agent make this green without doing the work?"
This gap was surfaced by the eval suite (both configs missed it on a weak model) and is regression-verified: on Haiku the proxy-gaming point went from 0/2 → 6/6 runs after the fix.
📊 Weak-model validation (the real signal)
On a frontier model the benchmark is at the ceiling (the model already knows this stuff). On a weaker model (Haiku class) the skill delivers a measured +16 points — 74% → 90% pass rate on an 8-case subset — concentrated on the verify / human-gate / stopping-condition family. Full methodology and honest results in evals/RESULTS.md (6 iterations).
📚 Expanded case library
evals/evals.json grown to 16 graded cases spanning all four loop patterns (heartbeat / cron / hook / goal) + long-horizon context, design / review / diagnose, including an adversarial-framing subset.
📦 Cleaner bundle
The .skill bundle is now built from a clean export — no .git internals — and ships all five README languages (en / 繁中 / 简中 / 日本語 / 한국어), references, assets, and LICENSE.
Install
Download loop-engineering.skill below, or:
git clone https://github.com/maxmilian/loop-engineering ~/.claude/skills/loop-engineeringWorks with Claude Code, Codex, Copilot CLI, Gemini CLI (see README).
v0.1.0 — Loop Engineering
First tagged release of Loop Engineering — a portable skill that gives a coding agent a battle-tested framework for designing & reviewing autonomous / semi-autonomous agent loops.
What it does
- Design mode — build a new self-running agent / loop / background worker.
- Review mode — audit an existing loop's stopping conditions, guardrails, verification, and escalation paths.
Distilled from 12 sources (Anthropic's context-engineering guidance, the Ralph loop / RPI methodology, Claude Code's agent-loop docs, and the 2026 "loop engineering" writing) into seven load-bearing principles plus reference material.
Benchmarked against a no-skill baseline on deliberately tricky cases: pass rate 87% → 100%, with lower variance. Eval set + per-iteration results are in
evals/.
Install
Claude Code (clone):
```bash
git clone https://github.com/maxmilian/loop-engineering ~/.claude/skills/loop-engineering
```
Prebuilt bundle: download `loop-engineering.skill` below and install via your plugin / skill manager. Works across Claude Code / Codex / Copilot / Gemini — a skill is just a folder with a `SKILL.md`. See the README for per-tool paths.
MIT licensed.