diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..6b4b19b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,89 @@ +# AGENTS.md: the operating layer + +> Tool-agnostic front door. Any agent runtime (Claude Code, Codex, Gemini, a +> human) reads this first to learn how work is done in this repo. It carries the +> portable operate-contract: what to read, how to run a unit of work, when work +> is done, and when to stop and ask. + +## Enforcement boundary (read this first) + +**This file is a contract, not a guardrail.** Enforcement is Claude-Code-only: the +hooks (safety-gate, push-to-main blocker, anti-rationalization Stop hook, the +verification pipeline) are what actually block bad outcomes, and they run only +under Claude Code. Under any other runtime (Codex, Gemini, a bare LLM) +`AGENTS.md` is **advisory only**: it tells an agent what to do, but nothing +enforces it. Do not assume the guardrails are portable. They are not, until the +v3.x multi-runtime agent-hook work lands. See `docs/PHILOSOPHY.md` (honesty rule: +never over-claim portable enforcement) and `CLAUDE.md` for the CC-specific layer +(hooks, slash commands, plugin). + +The rest of this file is four portable zones. The goal-crafter +(`commands/assign.md`) projects them into a six-section `/goal` (see "How a goal +is composed" at the end). + +--- + +## 1. Read in this order + +Orient before you touch anything. Read top to bottom; stop when you have enough. + +1. **AGENTS.md** (this file) - how work is done here; the operate-contract. +2. **CLAUDE.md** - the Claude-Code layer: stack, structure, rules, hooks, commands, plugin. +3. **docs/specs/SPEC-NNN-.md** - the active spec; the shared contract for the cycle. Read its `## Verification` and `## After state` before implementing. +4. **docs/architecture.md** / **WORKFLOW.md** - reference, not required per task. Read `docs/architecture.md` for how the pieces fit; read `WORKFLOW.md` for the lanes and the gate at each phase boundary. + +## 2. Task loop + +How to do one unit of work. The smallest verifiable increment, verified, committed. + +1. **Size the lane.** Pick `tiny` / `normal` / `full` / `bug` / `backfill` per `WORKFLOW.md`. When in doubt between two lanes, take the heavier one. +2. **Read the spec and its acceptance criteria.** For a spec-driven task: the active spec's task row, its AC, its `## Verification`, and its `## After state`. No spec (tiny lane): the one obvious edit. +3. **Implement the smallest verifiable increment.** One logical change. No speculative features, no premature abstraction; clarity over cleverness. +4. **Verify.** Run the spec's `## Verification` command (or the lane's check). Do not claim a result you did not run. +5. **Commit.** Conventional commit, one logical change. No spec/ticket IDs in the subject line. + +If you cannot make progress, see zone 4 (Pause if) and stop with a named blocker note. Do not churn. + +## 3. Done means + +A task or goal is done only when **its acceptance criteria are met AND the +verifier actually ran the check**, not when you claim they pass. Self-reported +"done" is not proof. + +Concretely, done means: **acceptance criteria met, the check actually ran (not +just asserted), review recorded + report written, and the final response says +what changed and what was not attempted.** If you could not run the check, report +that plainly; the anti-rationalization hook is the backstop for premature +completion under Claude Code, but the honesty obligation is yours under any +runtime. + +## 4. Pause if (ask a human) + +Stop and ask a human before acting on any of these. These are decisions with +direction or irreversible cost that a goal loop must not make on its own. + +- **Architecture direction** - a change to how the pieces fit, a new component, an interface or data-model shape. +- **Source-of-truth hierarchy** - which file or section is canonical when two disagree (for example, moving the operate-contract between `AGENTS.md`, `CLAUDE.md`, and `WORKFLOW.md`). +- **Validation removal** - weakening, deleting, or bypassing a test, an assertion, a hook, or any guardrail. +- **Risk-classification change** - moving work to a lighter lane, or narrowing a `full`-lane trigger (auth, authz, hooks, data model, data loss, audit/security, external provider, API contract, migration). +- **Privacy / security** - secrets, credentials, access scope, anything that touches what data leaves the repo or who can reach it. + +When you pause: write the named blocker, state the decision you are not making and why, and stop. + +--- + +## How a goal is composed (for `commands/assign.md`) + +The goal-crafter projects these four zones into a six-section `/goal`. The mapping +is a **composition, not 1:1**: two of the six sections come from the active spec, +not from this file. Keep the four zone names stable; renaming one without updating +`commands/assign.md` breaks the projection. + +| `/goal` section | Source | +|---|---| +| Context-to-read | AGENTS.md zone 1 (Read in this order) | +| Constraints | CLAUDE.md / AGENTS.md rules | +| Operating rules | AGENTS.md zone 2 (Task loop) | +| Validation loop | the active spec's `## Verification` | +| Done-when | AGENTS.md zone 3 (Done means) + the active spec's `## After state` | +| Pause-if | AGENTS.md zone 4 (Pause if) | diff --git a/CHANGELOG.md b/CHANGELOG.md index 9b9bc88..519ac3a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,12 +4,22 @@ All notable changes to dwarves-kit are documented here. ## [Unreleased] +## [1.6.0] - 2026-05-22 + ### Changed - **Dropped hand-maintained component counts and stripped spec IDs from the WORKFLOW contract.** The exact `N hooks / N commands / N agents / N skill` strings are removed from every live surface (`.claude-plugin/{plugin,marketplace}.json`, `README.md`, `MANUAL.md`, `CLAUDE.md`, the `docs/architecture.md` component table, `tool.toml`) and replaced with qualitative phrasing; the `92`/`238` test-suite-total comments in `CLAUDE.md` are dropped too. Counts rot silently: this swept up live drift the guards never caught (`architecture.md` said 19 commands, `tool.toml` said 12 hooks / 18 commands / 9 agents). With nothing left to keep in sync, both count tests in `tests/test-meta.sh` are removed (the ID-013 component-count guard added earlier this cycle and the older command-count parity test); meta suite 238 -> 213. `WORKFLOW.md` no longer cites `SPEC-NNN`/`ADR-NNN` inline (the `docs/specs/SPEC-NNN-.md` filename pattern stays): rules are stated by concept with one provenance pointer to `docs/specs/` + `docs/decisions/`, and the count-sweep chore paragraph is replaced by a version-only note. SPEC-004 + SPEC-005 `Status:` reconciled VALIDATED -> SHIPPED (shipped alongside SPEC-006). ### Added +- **Release-hygiene guard: a warn on a phantom version cut (SPEC-028).** Closes the recurring messy-release state the SPEC-018 + SPEC-027 retros both flagged (`VERSION`/`plugin.json` cut to a version that was never tagged, with `[Unreleased]` piling on top). Two warn-only surfaces detect the **phantom cut** (`VERSION` names a version with no matching `git tag`), plus a soft heads-up when `[Unreleased]` is non-empty above it: a `/user:ship` Step-4a warn (at the version/tag decision) and a `commands/kit-health.md` check (run before tagging, repo-scoped, degrades to a no-op outside the repo). Warn-only, never blocks (Approach A; PHILOSOPHY reserves hard blocks for safety, and an untagged cut is a legitimate release transient, so a hard `tests/test-meta.sh` tag assertion was rejected). Both surfaces share one pinned check shape (whitespace-strip + `git tag -l "v$VER"` + a `git rev-parse --git-dir` degrade guard + the same `[Unreleased]` non-empty awk), pinned by 2 `tests/test-meta.sh` assertions (meta suite to 256). The verification pipeline earned its keep here: task-verifier caught one DEC-005 guard drift, the integration-checker caught a second, and `/user:review` caught a third (the soft accumulation-signal drift), all on the guard's own two copies (DEC-005/DEC-006). Source: SPEC-028 / ID-026. +- **Mid-flight spec amend: a declared path to add scope to a building spec without restarting the lane (SPEC-027).** Closes the `BUILDING -> SPECIFYING -> BUILDING` gap (operating-layer-vision Scenario 7): when a build reveals scope that must be added now ("also do Y"), the operator approves, then the spec is amended in place at a task checkpoint instead of being silently mutated or the lane restarted. Convention, not a new command or hook (Approach A; PHILOSOPHY "earn the abstraction"). Four invariants, canonical in `WORKFLOW.md` "## Mid-flight amend": no lane restart (`Status:` stays VALIDATED, only the delta re-validated), add-only (completed `- [x]` tasks frozen), recorded + operator-approved at a checkpoint (an optional on-demand `## Amendments` provenance section in the spec), resume with `/user:next` (not a fresh `/user:execute`). Projected into `docs/operating-layer-vision.md` §3.3 (the transition row) + §5 (gap closed), `commands/execute.md` (the no-silent-mutation anti-pattern now points at the amend path), `commands/spec.md` (the `## Amendments` template section), `docs/PLAYBOOK.md` Scenario 7 + `docs/ORCHESTRATION.md` 5.4 (operator projections), and pinned by 4 `tests/test-meta.sh` assertions (meta suite to 254). `/user:review` (HIGH) added the operator-approval gate the canonical rule had omitted (DEC-008). Source: SPEC-027 / ID-023. +- **Kit-root `AGENTS.md`: a tool-agnostic operate-contract front door (SPEC-024).** A portable, four-zone operating layer (the same shape any agent runner can read) that fronts the kit's behavioral contract; `CLAUDE.md` and `WORKFLOW.md` now point at it so the cycle and house rules have one neutral entry point instead of being CLAUDE-only. Ships a downstream template at `examples/hello-spec/AGENTS.md` so a consuming project starts with the same front door. Source: SPEC-024. +- **`backfill` brownfield lane (SPEC-024)**: an intake lane for adopting the kit into an existing repo (bring the operate-contract, specs convention, and guardrails onto code that predates them), recorded in `WORKFLOW.md` alongside the existing intake lanes. +- **Six-section goal projection in `/user:assign`**: the goal draft `/user:assign` writes now projects the backlog item across six fixed sections (Context-to-read / Constraints / Operating rules / Validation loop / Done-when / Pause-if) sourced from the `AGENTS.md` zones + the active spec, so a handed-off goal carries fuller context into its lane. +- **`## After state` section in the `/user:spec` template**: specs now scaffold the observable post-implementation end state, complementing the existing Verification / Open-questions stop-criteria. +- **doc-impact-map rows for `AGENTS.md` and the top-level files**: the WORKFLOW doc-impact map (reviewed by `/user:ship` + `/user:retro`) now lists `AGENTS.md` and a "new top-level file" trigger so a change that touches them is flagged for a doc update. +- **Regression coverage for the install settings-merge against a pre-existing third-party hook (test/guard, not a fix)**: `tests/test-meta.sh` gains a block asserting `install.sh`'s `settings.json` merge preserves an existing third-party hook entry. This is added coverage, not a bug fix: the current installer already unions third-party hooks correctly, so no installer change was needed. Plus review-driven anti-drift assertions (CLAUDE.md/WORKFLOW.md carry no read-order restatement) and hello-spec AGENTS.md zone pins. Meta suite to 241 asserts. Source: SPEC-024. - **`/user:ui-design` command + the downstream UI-design loop (SPEC-020)**: writes a structured `## UI design` brief (aesthetic-direction preamble + layout + states matrix + named viewports + a11y bars + 3-tier token ladder + voice) into the active spec (else the pre-spec brief), delegates generation to the external `frontend-design` skill (the kit ships no renderer), critiques via `/user:visual-team`, and runs a bounded auto-revise loop (E6 `` injection-wrap, E7 unconditional accumulated-feedback re-send, terminate on SOLID / RECONSIDER / max-2 cap). Opt-in, report-only, downstream-facing (the PHILOSOPHY carve-out + kit-health allow-note now name both `/user:visual-team` and `/user:ui-design`). The kit is now 20 commands. Brief enriched per the 2026-05-21 deep scan (`docs/research/2026-05-21-ui-design-loop-deep-scan.md`): aesthetic-direction from `frontend-design`'s real input, token ladder + states matrix + a11y bars + voice adapted from `nextlevelbuilder/ui-ux-pro-max-skill` (its renderer / fonts / `.cjs`+`.py` tooling rejected per bash-over-binaries), loop shapes from gstack. Dogfooded through `/user:spec-validate` twice; the re-dogfood caught an unimplementable numeric stop (visual-team emits no combined score) and folded it to a SOLID-verdict stop. Credits: gstack (loop shapes), `frontend-design` (generator), `ui-ux-pro-max-skill` (brief sub-shapes). Source: SPEC-020. - **`/user:absorb` command + the absorption ritual (SPEC-004)**: a maintainer-only, proposal-only external-absorption audit that generalizes SPEC-002/SPEC-014's one-shot surveys into a recurring ritual. `docs/ABSORPTION.md` carries it: two lanes (Credits drift re-audit + a seed-rescan of the SPEC-014 survey set, scanning the interest areas workflow/agents/QA/UI), the adoption rubric (>=10), the gate, and the **human merge gate** (discovery + scoring + drafting are automatic; adopting a source or adding it to Credits is maintainer-approved, preserving "synthesize, don't originate"). `/user:absorb` writes a dated, ranked + capped, proposal-only report under `docs/absorption/` (HEAD-SHA baseline for since-last-run, an overflow appendix so a real ADOPT is never dropped, a `git status` self-check); QA/UI candidates needing binaries route to "recommend external". The kit is now 19 commands. Think+Design narrowed lane B to a seed-rescan (web-search discovery deferred as tool-weak). Also corrected the plugin manifests' stale hook/agent counts (now 14 hooks / 11 agents). Source: SPEC-004 + `docs/ABSORPTION.md`; DATA-not-instructions guard from ADR-0008. - **Reviewer 5 (Solution-Design & Extensibility Critic)** in `/user:spec-validate`: flags shallow or non-extensible designs, with a calibration clause (no false-positive storm) and a legacy-grace clause for specs predating the richer template. Source: SPEC-008; forked from `superpowers:brainstorming` ("design for isolation and clarity") + its spec-document-reviewer calibration. Not a runtime dependency. @@ -44,8 +54,6 @@ All notable changes to dwarves-kit are documented here. ## [1.6.0] - 2026-05-20 -Orchestration layer (SPEC-003) plus upstream-audit absorption and lineage hygiene (SPEC-002). - ### Added - **`WORKFLOW.md`** (repo root): the agent-facing workflow contract. Names each lifecycle phase, routes work by risk tier (tiny / normal / full), and points at the existing guardrail that enforces each boundary. Delivered via the `CLAUDE.md` pointer (auto-loaded each session); it suggests and routes, it does not block. Downstream template ships at `examples/hello-spec/WORKFLOW.md` (`.planning/` path convention). Source: SPEC-003; harness-experimental intake model + the AGENTS.md pattern. diff --git a/CLAUDE.md b/CLAUDE.md index e93ec6a..05306f7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -57,7 +57,9 @@ The kit unified the spec location onto `docs/specs/SPEC-NNN-.md` for both ## Workflow -The kit eats its own dog food. The full lifecycle, the risk-tier lanes, and the gate at each phase boundary live in one place: the [`WORKFLOW.md`](WORKFLOW.md) contract. Read it after this file. For kit-on-kit work, spec drafts live in `docs/specs/` (see Spec location above), not `.planning/`. +The operate-contract (read-order, task loop, done-definition, the "Pause if" list) is canonical in [`AGENTS.md`](AGENTS.md), the tool-agnostic front door. Read it first; this file is the Claude-Code layer on top of it. Do not restate the operate-contract here; point at `AGENTS.md`. + +The kit eats its own dog food. The full lifecycle, the risk-tier lanes, and the gate at each phase boundary live in one place: the [`WORKFLOW.md`](WORKFLOW.md) contract. Read it after `AGENTS.md` and this file. For kit-on-kit work, spec drafts live in `docs/specs/` (see Spec location above), not `.planning/`. `/user:kit-health` is the maintainer-only self-assessment against PHILOSOPHY.md. Run it before tagging. @@ -65,4 +67,4 @@ Hooks fire on Claude Code events automatically; do not call them manually. Debug ## Template for downstream projects -If you are scaffolding a new project that will use dwarves-kit, do not copy THIS file. Use `examples/hello-spec/CLAUDE.md` as the template; it shows the full shape (Project, Tech Stack, Commands, Repository Structure, Code Quality Rules, Workflow, Spec Location) with realistic placeholder content. +If you are scaffolding a new project that will use dwarves-kit, do not copy THIS file. Use `examples/hello-spec/CLAUDE.md` as the template; it shows the full shape (Project, Tech Stack, Commands, Repository Structure, Code Quality Rules, Workflow, Spec Location) with realistic placeholder content. The downstream front door is `examples/hello-spec/AGENTS.md` (the tool-agnostic operate-contract); copy it alongside, with `CLAUDE.md` as the Claude-Code layer on top. diff --git a/MANUAL.md b/MANUAL.md index b53ba12..5dca3d5 100644 --- a/MANUAL.md +++ b/MANUAL.md @@ -60,10 +60,10 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active ### `/user:assign` **Phase:** orchestrate (backlog item -> goal draft -> lane) -**Reads:** `_meta/BACKLOG.md` Active queue (by `ID-NNN`), the item's Lane column -**Writes:** `.claude/goals/.md` (the SPEC-005 draft contract; never `.claude/last-goal.md`) -**When to invoke:** you picked an item from "what's left?" and want it scoped into a goal and routed into the right lane -**Common gotcha:** it is a mutator-dispatcher: it sets up the goal and hands off, it does NOT execute. It detects the goal-loop activator (built-in `/goal`, `ralph-loop`, or `goal-craft`) and degrades to a plain draft file if none is installed. Idempotent per id. Source: SPEC-006 + ADR-0011. +**Reads:** `$ARGUMENTS` = either an `ID-NNN` (today's path) OR **freeform intent** (anything not matching `^ID-[0-9]+$`, e.g. "apply SDD to X"); `_meta/BACKLOG.md` Active queue, the item's Lane column, `AGENTS.md` zones (the projection source for the six-section goal) + the active spec's `## Verification` / `## After state`. Freeform delegates the crystallize interview to `/user:think`. +**Writes:** `.claude/goals/.md` (the SPEC-005 draft contract; never `.claude/last-goal.md`), a six-section operating directive (Context-to-read / Constraints / Operating rules / Validation loop / Done-when / Pause-if). On the **freeform path** it first writes a new sanitized `_meta/BACKLOG.md` row with a freshly allocated ID (row-before-draft, approve-before-allocate). +**When to invoke:** you picked an `ID-NNN` from "what's left?", OR you have a freeform feature idea / vague brief with no ID yet, and want it scoped into a goal and routed into the right lane. +**Common gotcha:** it is a mutator-dispatcher: it sets up the goal and hands off, it does NOT execute. The freeform path **delegates** the interview to `/user:think` (it does not embed one). It detects the goal-loop activator (built-in `/goal`, `ralph-loop`, or `goal-craft`) and degrades to a plain draft file if none is installed. Idempotent per id (and per slug for freeform). Source: SPEC-006 + ADR-0011; freeform front door SPEC-026. ### `/user:spec` @@ -72,7 +72,7 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active **Writes:** `docs/specs/SPEC-NNN-.md` (Status: DRAFT), `docs/research/{stack,features,architecture,pitfalls}.md` **When to invoke:** after `/think`, or directly if the work is well-scoped already **Common gotcha:** the research agents are parallel-dispatched via Task tool. If your Claude Code is older than v2.0.60, they fall back to inline research and the run is slower. -**Template sections:** the generated spec scaffolds Solution depth (approaches / chosen + why / extensibility, SPEC-008), plus an optional `### Interfaces (I/O contract)` under Technical Design and an optional `## Failure modes` table (SPEC-009). Both optional sections are lane-scoped; Reviewers 2 and 5 check them when present. It also pins `## Verification` (the command(s) that prove the spec done) and `## Open questions` (the blocker landing zone a `/goal` loop appends to), so a validated spec is natively pointer-`/goal`-ready (SPEC-012 P1). +**Template sections:** the generated spec scaffolds Solution depth (approaches / chosen + why / extensibility, SPEC-008), plus an optional `### Interfaces (I/O contract)` under Technical Design and an optional `## Failure modes` table (SPEC-009). Both optional sections are lane-scoped; Reviewers 2 and 5 check them when present. It also pins `## Verification` (the command(s) that prove the spec done) and `## Open questions` (the blocker landing zone a `/goal` loop appends to), so a validated spec is natively pointer-`/goal`-ready (SPEC-012 P1). An optional, on-demand `## Amendments` section (added only when a mid-flight amend happens, never an empty scaffold) records add-scope provenance during a build (SPEC-027). ### `/user:spec-validate` @@ -90,6 +90,7 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active **Dispatches:** worker subagent per task, then task-verifier, then fix-agent on FAIL:fixable (retry max 2) **When to invoke:** when handing off to a contractor OR running the kit on yourself end-to-end **Common gotcha:** verification adds ~2x token cost per task. Worth it for the FAIL:fixable catch rate; budget accordingly. Each worker first expands its task into bite-sized verify-each-step increments (TDD when a unit test fits; grep/bash/test-suite verify for doc and config tasks) before coding. +**Mid-flight amend:** if a build reveals scope that must be added now ("also do Y"), do not silently edit the spec or restart the lane. With your approval, amend at a task checkpoint (append `- [ ]` tasks, record an `## Amendments` entry, Status stays VALIDATED) and resume with `/user:next`. The canonical rule is WORKFLOW.md "## Mid-flight amend"; the operator card is PLAYBOOK Scenario 7 (SPEC-027). ### `/user:next` @@ -97,7 +98,7 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active **Reads:** `docs/specs/SPEC-NNN-.md` **Writes:** code, tests; you drive the verification yourself **When to invoke:** when you want hands-on control or the next task needs subtle judgment that the verification pipeline might over-correct on -**Common gotcha:** picks the next unchecked task only. To skip a task or pick a specific one, edit SPEC.md task ordering first. +**Common gotcha:** picks the next unchecked task only. To skip a task or pick a specific one, edit SPEC.md task ordering first. This unchecked-only behavior is also why `/user:next` (not a fresh `/user:execute`) is the way to resume after a mid-flight amend: it runs the newly appended tasks and skips the done rows (SPEC-027). ### `/user:debug` @@ -139,6 +140,7 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active **Writes:** bumped `VERSION`, new `CHANGELOG.md` entry, git tag, PR via `gh` **When to invoke:** review is green and docs are synced **Common gotcha:** blocks if any REVIEW*.md verdict is FIX-REQUIRED. Use `/review-team` and `responding-to-review` to triage before re-running ship. +**Release-hygiene warn (Step 4a):** at the version step it warns (never blocks) on a phantom cut, `VERSION` naming a version with no matching git tag, with a heads-up when `[Unreleased]` is accumulating above it. Warn-only; tag `v` or confirm intentional, then continue (SPEC-028). ### `/user:retro` @@ -159,9 +161,9 @@ Opt-in lane between `/user:spec-validate` and `/user:execute`. Reads the active ### `/user:kit-health` **Phase:** self-assessment against PHILOSOPHY.md -**Reads:** the kit itself (file counts, hook performance, settings validity, source citations) +**Reads:** the kit itself (file counts, hook performance, settings validity, source citations), plus a repo-scoped release-hygiene check (a phantom cut: `VERSION` names a version with no matching git tag; degrades to a no-op outside the repo) **Writes:** verdict in chat (`SHIP / FIX-REQUIRED / REJECT`) -**When to invoke:** maintainer-only, before tagging a release of the kit +**When to invoke:** maintainer-only, before tagging a release of the kit (the release-hygiene check is exactly the "before tagging" guard) **Common gotcha:** the rejection-first verdict will REJECT on real violations. Do not soften the criteria; address them. ## Hooks (no invocation) diff --git a/README.md b/README.md index ec08296..9df70f4 100644 --- a/README.md +++ b/README.md @@ -156,6 +156,8 @@ Don't run both install paths on the same machine -- hooks would register twice. /user:retro Retrospective (10 min, after shipping) ``` +Work is sized by risk lane before it starts (tiny / normal / full / bug, plus a `backfill` lane for brownfield: review an existing codebase and write the operating-layer docs without changing app behavior). The lanes, the gate at each phase boundary, and the operate-contract the agent follows live in [`AGENTS.md`](AGENTS.md) and [`WORKFLOW.md`](WORKFLOW.md). + ## Debug mode Set `DWARVES_KIT_DEBUG=1` to see what every hook is doing: @@ -216,10 +218,15 @@ These tools complement the kit but are installed separately: ``` dwarves-kit/ tool.toml Kit metadata (name, version, language=bash, deps) + AGENTS.md Tool-agnostic operate-contract front door (any runtime reads it first) + WORKFLOW.md The cycle, the risk-tier lanes, the gate at each boundary MANUAL.md Operator reference for the commands + docs/ORCHESTRATION.md The flow/loop view: lanes, loops, triggers, stop conditions (ASCII diagrams) + docs/PLAYBOOK.md The interaction view: what you say -> what happens (operator scenarios) + docs/operating-layer-vision.md Design-first vision + the SDLC state machine (north-star for the operating-layer specs) RUNBOOK.md Hook misbehavior diagnosis + recovery README.md / CONTRIBUTING.md / CHANGELOG.md / VERSION / LICENSE - CLAUDE.md Project template + CLAUDE.md Project template; the Claude-Code layer on top of AGENTS.md install.sh / settings.json Bash install path .claude-plugin/ Plugin install path (plugin.json, marketplace.json) .github/workflows/test.yml CI: macOS + Ubuntu test matrix diff --git a/WORKFLOW.md b/WORKFLOW.md index b2a890c..bcf0bb7 100644 --- a/WORKFLOW.md +++ b/WORKFLOW.md @@ -5,11 +5,13 @@ > It suggests and routes; it does not block. The only hard stops are the > safety-gate hook, the push-to-main blocker, the anti-rationalization Stop > hook, and the verification pipeline. +> For the visual flow/loop view (every flow + alt-flow, its trigger, and its +> stop condition, with ASCII diagrams), see `docs/ORCHESTRATION.md`. -## Required reading (in order) -1. CLAUDE.md - project context: stack, structure, rules -2. docs/specs/SPEC-NNN-.md - the active spec; the shared contract for the cycle -3. docs/architecture.md - how the pieces fit (reference; not required per task) +## Required reading +`AGENTS.md` is the front door and owns the read-order; it is the single source. +Read `AGENTS.md` zone 1 ("Read in this order") for the full ordered list, then +return here. This file does not restate the list, so the two cannot drift. ## Size the work first (risk-tiered intake) Pick a lane before you start. Smaller work skips ceremony. @@ -20,6 +22,7 @@ Pick a lane before you start. Smaller work skips ceremony. | normal | one bounded feature or fix | /spec, /execute, /review, /ship | | full | touches auth, authz, hooks, data model, data loss, audit/security, an external provider, an API contract, a migration, or weakens validation | /think, /spec, /spec-validate, /execute, /review-team, /docs, /ship, /retro | | bug | a defect, regression, or failing test (not a new feature) | /debug (root cause before any fix), then /review | +| backfill | brownfield: review an existing codebase and write the operating-layer docs (AGENTS.md / CLAUDE.md / specs) | review the code, write the docs. Doc-output only; no app-behavior change, no app-code edits. /spec optional. | When in doubt between two lanes, take the heavier one. Anything in the full-lane trigger list uses the full lane unless you explicitly narrow the scope and say why. @@ -59,22 +62,57 @@ session start for whatever goal-loop activator is present, hand off to the lane's first command. Mutator; does NOT execute, never writes last-goal.md. - -> the lane runs tiny | normal | full (see the cycle table above) + -> the lane runs tiny | normal | full | bug | backfill (see the lane table above) normal/full -> /user:spec -> /user:spec-validate -> /user:execute (verify pipeline) (opt-in: /user:devs-team + /user:visual-team before spec; /user:test-plan before execute; /user:ui-design for downstream UI work, after /user:design) -> /user:review -> /user:docs -> /user:ship -> /user:retro + backfill -> review the codebase, write AGENTS.md / CLAUDE.md / specs, then + /user:review (optional). No /user:execute; no app-code edits. -> on ship /user:ship reviews the completeness log; ID-NNN drops off the queue (CHANGELOG is the canonical shipped record). ``` -**Detector/mutator split.** `/user:start` and `/user:next` only read and render; `/user:assign` is the only mutator. **Activator-agnostic.** `/user:assign` writes only the `.claude/goals/.md` draft and surfaces its body; activation (starting the loop) is done by whatever primitive is present (the built-in `/goal`, the `ralph-loop` plugin, or the `goal-craft` skill). The kit NEVER writes `.claude/last-goal.md`; if no activator exists, the draft is a plain reusable file. **"Even the goal loop follows WORKFLOW"** is delivered honestly: the safety subset is hard-enforced by existing hooks (anti-rationalization, the verification pipeline, the push-to-main blocker); decision/doc completeness is warned + logged to `~/.claude/dwarves-kit/logs/completeness.log` and reviewed at `/user:ship` + `/user:retro`, not hard-blocked mid-loop (PHILOSOPHY rejects hard-gating process completeness). +**Freeform front door.** `/user:assign` accepts freeform intent, not only an `ID-NNN`. Given freeform, it delegates the interview to `/user:think`, pauses for approval, then allocates the ID + BACKLOG row before routing as usual; the ID-first path is unchanged. **Detector/mutator split.** `/user:start` and `/user:next` only read and render; `/user:assign` is the only mutator. **Activator-agnostic.** `/user:assign` writes only the `.claude/goals/.md` draft and surfaces its body; activation (starting the loop) is done by whatever primitive is present (the built-in `/goal`, the `ralph-loop` plugin, or the `goal-craft` skill). The kit NEVER writes `.claude/last-goal.md`; if no activator exists, the draft is a plain reusable file. **"Even the goal loop follows WORKFLOW"** is delivered honestly: the safety subset is hard-enforced by existing hooks (anti-rationalization, the verification pipeline, the push-to-main blocker); decision/doc completeness is warned + logged to `~/.claude/dwarves-kit/logs/completeness.log` and reviewed at `/user:ship` + `/user:retro`, not hard-blocked mid-loop (PHILOSOPHY rejects hard-gating process completeness). + +## Mid-flight amend +Canonical rule. You are mid-`/user:execute` on a `VALIDATED` spec and the work +reveals scope that must be added now ("also do Y"). Amend the spec in place; +do not restart the lane and do not silently mutate it. Other docs +(PLAYBOOK, ORCHESTRATION, `commands/execute.md`) point here; they do not restate +this rule. The state-model row (`BUILDING -> SPECIFYING -> BUILDING`) lives in +`docs/operating-layer-vision.md` §3.3; this section carries the operational rule. + +The amend is governed by four invariants: + +- **No lane restart.** `Status:` stays `VALIDATED` across an amend; only the + DELTA is (re-)validated (full lane: `/spec-validate` on the new tasks; normal + lane: advisory). Dropping back to `DRAFT` would be a lane restart, the exact + thing this path removes. +- **Completed work is frozen (add-only).** An amend may only ADD scope (new + `- [ ]` tasks, new acceptance criteria, new after-state bullets). It must not + rewrite an already-done (`- [x]`) task's contract; the `- [x]` rows are + byte-for-byte unchanged. Rewriting a done task is the heavier re-open / re-spec + path, not an amend. +- **Recorded at a checkpoint, operator-approved, not mid-worker.** The amend + happens between tasks: the in-flight task is verified and committed first (or no + task is in flight). Adding scope is an operator decision, never the loop's: an + autonomous `/user:execute` pauses for the operator to approve the added scope + before the amend lands (this is AGENTS.md zone 4 "Pause if", a scope / risk / + architecture change). Then record it as an entry in the spec's `## Amendments` + section (optional, on-demand; see `commands/spec.md`), one line per amend: + `AMEND-NNN: date | what | why | at which checkpoint | new tasks | re-validated`. +- **Resume leads with `/user:next`, not a fresh `/user:execute`.** `/next` picks + the next undone `- [ ]` task and skips `- [x]` done rows, so resume runs only + the amended tasks. `/execute` re-parses and re-presents the whole plan, so it + is the wrong door after an amend. ## Completion contract -A task is done only when its acceptance criteria are met and the verifier has -actually run the tests, not when you claim they pass. If you cannot run the -check, report that plainly; the anti-rationalization hook is the backstop for -premature completion. Self-reported "done" is not proof; the task-verifier is. +The done-definition is canonical in `AGENTS.md` zone 3 ("Done means"); do not +restate it here. In the kit, the task-verifier is what proves "done" (self-reported +"done" is not proof), and the anti-rationalization hook is the backstop for +premature completion. The clauses below add kit-specific completeness checks on top +of that done-definition. ### Completeness clauses (warn + log, reviewed at ship) Two self-check clauses run during Build/Reflect. Both WARN and LOG to `~/.claude/dwarves-kit/logs/completeness.log` (the `spec-drift-guard` logging shape); neither hard-blocks. `/user:ship` and `/user:retro` review that log at the gate. Hard blocks stay reserved for the safety subset (PHILOSOPHY rejects hard-gating process completeness). @@ -99,9 +137,13 @@ Per change-type, the companion docs that must move with it. This covers the enum | a new `docs/decisions/` ADR | README + `docs/architecture.md` cross-refs | | a new `docs/specs/SPEC-NNN` | `_meta/BACKLOG.md` status, the spec's `Status:` header | | **a new top-level dir under the kit root** | **this doc-impact map (WORKFLOW.md)**, README "Project structure", `docs/architecture.md` | +| **a new top-level file under the kit root** | **this doc-impact map (WORKFLOW.md)**, README "Project structure", `docs/architecture.md` | +| `AGENTS.md` (kit root) | `CLAUDE.md` + `WORKFLOW.md` pointers (must not drift), `examples/hello-spec/AGENTS.md` (downstream template), `commands/assign.md` (the six-section projection reads its zones), `tests/test-meta.sh` | | any shipped change (normal/full) | `CHANGELOG.md`, `VERSION`, `.claude-plugin/plugin.json` version, `tool.toml` version, `docs/retro/v.md` | -The bolded row is self-maintaining: adding a new top-level dir must update this map. +The bolded rows are self-maintaining: adding a new top-level dir or file must update this map. + +The `backfill` lane (see the lane table) produces operating-layer docs rather than touching a source path: a backfill run writes `AGENTS.md`, `CLAUDE.md`, and any specs for the reviewed codebase, so its companion docs are those it writes. **Version surfaces.** The version string is duplicated and must stay in sync: it lives in `VERSION` (the source of truth), `.claude-plugin/plugin.json`, and `tool.toml`. Bumping the version means updating those; `marketplace.json` inherits it via `"source": "."` and needs no bump. The kit does NOT keep component counts (`N hooks`, `N commands`, etc.) in prose: describe the component set qualitatively, never as a hand-maintained number that silently rots. diff --git a/_meta/BACKLOG.md b/_meta/BACKLOG.md index 5da6a94..7fda0af 100644 --- a/_meta/BACKLOG.md +++ b/_meta/BACKLOG.md @@ -34,9 +34,32 @@ Lane = the WORKFLOW.md risk tier (`tiny` / `normal` / `full`). | ID-002 | Absorb skills/hooks developed in ops-toolkit into the kit (internal lane) | item e | SPEC-007 (PARKED; see its Parked note) | full | parked | | ID-003 | Deep orchestration scan of the copied repos vs our WORKFLOW (report + absorb plan) | item 2 | `docs/research/2026-05-20-orchestration-deep-scan.md` | normal | shipped (report delivered; recs folded into SPEC-006) | | ID-012 | Goal-loop fidelity (one theme, two parts). P1: stop-criteria in the `/spec` template (pin `## Verification` + `## Open questions` so specs are pointer-`/goal`-ready) -- build now, tiny lane. P2: QA gate around the autonomous loop (verify-into-loops + SPEC-006 completeness clauses + Reviewer 5) -- held until the pointer-`/goal` pattern has real runs | maintainer Q3 2026-05-20 + goal-readiness convergence 2026-05-21 | SPEC-012 | normal (P1) / full (P2) | P1 shipped (see CHANGELOG); P2 held | +| ID-015 | AGENTS.md operating layer: tool-agnostic entrypoint + brownfield backfill lane + six-section goal projection + observable After-state spec section (ships ADR-0013) | ADR-0013 (hoangnb24/harness-experimental study, 2026-05-21) | SPEC-024 | full | shipped (CHANGELOG [Unreleased], no version bump; PR #7) | +| ID-016 | Promote the "new SPEC -> BACKLOG status row" doc-impact check to a guard (a hook or a kit-health line); it was missed across multiple cycles | SPEC-025 retro 2026-05-21 (recurrence clears the PHILOSOPHY section 5 bar) | TBD | normal | queued | +| ID-017 | Reconcile retro-file naming across the `retro` skill body + description, CLAUDE.md, and WORKFLOW.md to the repo's actual `RETRO-YYYY-MM-DD-.md` convention | SPEC-025 retro 2026-05-21 | TBD | tiny | queued | +| ID-018 | install.sh prints blind `cp` tips for AGENTS.md and CLAUDE.md; harden both to copy-if-absent at the command level (`cp -n` or a guard) so the printed command matches its "if absent" prose, or record why prose-only suffices | SPEC-024 review 2026-05-21 (security lens) | TBD | tiny | queued | +| ID-019 | Demo `examples/hello-spec/docs/specs/SPEC-001-version-flag.md` lacks a `## After state` section; add one so the demo is a complete exemplar of the post-SPEC-024 spec template | SPEC-024 review 2026-05-21 (test-coverage lens) | TBD | tiny | queued | +| ID-020 | Verification-pipeline absence-checks: teach task-verifier + integration-checker to assert that removed/replaced content is GONE for "replace, don't duplicate" / "remove X" tasks, not just that the new artifact exists. A replace-task left both copies and passed BOTH verifiers this cycle; only the independent review caught it | SPEC-024 retro 2026-05-21 (pipeline blind spot) | TBD | normal | queued | +| ID-021 | Reduce execute-worker shell/hook friction: pre-warn the recurring gotchas in `commands/execute.md`'s worker template (fish `noclobber` -> `>|`; no heredoc commit `-m` -> `git commit -F`; no `rm` -> `mv`), and investigate the commit-format/commit-msg heredoc mis-parse (it read a multi-line `-m` body as a 459-char subject) | SPEC-024 retro 2026-05-21 (worker friction) | TBD | normal | queued | +| ID-022 | Freeform front door: extend `/user:assign` to accept freeform intent (not only `ID-NNN`), auto-allocating an ID + BACKLOG row so "apply SDD to X" / a vague brief is a native one-shot intake (unparks the SPEC-024-deferred griller entry; preserves ID-first traceability) | PLAYBOOK.md scenarios 2026-05-22 | SPEC-026 | normal | validated | +| ID-023 | Mid-flight spec amend: a path to add scope to a VALIDATED / in-build spec without restarting the lane (state machine transition BUILDING -> SPECIFYING) | operating-layer-vision gap analysis 2026-05-22 | SPEC-027 | normal | shipped (CHANGELOG [Unreleased]; local branch feat/mid-flight-amend, no tag/PR) | +| ID-024 | Context-switch affordance across specs (worktree-per-spec) + an explicit ABANDONED terminal (state hygiene; vs the existing `parked`) | operating-layer-vision gap analysis 2026-05-22 | TBD | normal | queued | +| ID-025 | Re-open-shipped convention: a follow-up spec spun from a SHIPPED one (state machine transition SHIPPED -> TRIAGING) | operating-layer-vision gap analysis 2026-05-22 | TBD | tiny | queued | +| ID-026 | Release-hygiene guard: a kit-health line or hook that flags VERSION cut-but-untagged AND `[Unreleased]` spanning more than one spec (catch the messy-release state before it compounds) | SPEC-027 retro 2026-05-22 (recurrence of the SPEC-018 finding; clears the PHILOSOPHY section-5 bar) | SPEC-028 | normal | shipped (CHANGELOG [Unreleased]; local branch feat/release-hygiene-guard, no tag/PR; the guard self-fired on its own ship) | +| ID-027 | spec-validate autonomy-gate lens: for a spec whose behavior runs inside an autonomous loop (`/execute`, `/goal`), check whether it lets the loop make a scope / architecture / risk decision without a human gate | SPEC-027 retro 2026-05-22 (the cycle's HIGH was caught at review, should have been catchable at validate) | TBD | normal | queued | +| ID-028 | execute.md: acknowledge disjoint-file parallel worker dispatch (sequential is the safe default only for shared-file or dependent tasks; independent non-overlapping tasks may parallelize) | SPEC-027 retro 2026-05-22 | TBD | tiny | queued | +| ID-029 | Carve out "identical-by-contract" from the no-premature-abstraction (3x) rule: when two snippet copies MUST stay byte-identical (a shared check across surfaces), single-source or pin the exact block even at two uses, do not wait for the third | SPEC-028 retro 2026-05-22 (DEC-003's two inlined copies drifted three times in one cycle) | TBD | normal | queued | +| ID-030 | execute.md worker-dispatch guidance: when two tasks must produce identical logic, give both workers the exact same block, not one exact + one narrative (the asymmetry that seeded the SPEC-028 drift) | SPEC-028 retro 2026-05-22 | TBD | tiny | queued | Dependency notes: - ID-012 P1 (spec stop-criteria) shipped (SPEC-012, normal lane; see CHANGELOG); P2 (loop QA gate) is held until the pointer-`/goal` pattern has real runs; it will build on the now-shipped SPEC-006 completeness clauses + the existing verification pipeline. - ID-002 (internal absorption lane, SPEC-007) is parked; independent of the orchestration chain. +- ID-016 / ID-017 came out of the SPEC-025 retro; both are kit-internal hygiene, independent of the SPEC-024 chain. +- ID-018 / ID-019 came out of the SPEC-024 /review-team pass (deferred LOW findings); both are tiny-lane polish, independent of the SPEC-024 ship. +- ID-020 / ID-021 came out of the SPEC-024 retro; ID-020 (verifier absence-checks) relates to ID-016 (both are guard/verifier hardening) and is the highest-signal kit finding of the cycle. +- ID-022 came out of writing `docs/PLAYBOOK.md` (scenarios S2/S5): the freeform-intent gap. It unparks the SPEC-024-deferred griller entry; SPEC-026 drafts it. Independent of the other chains. +- ID-023 / ID-024 / ID-025 came out of `docs/operating-layer-vision.md` (the SDLC state-machine gap analysis): the transitions with no clean path today. ID-022 is the highest-priority of the operating-layer set; these three follow it. All trace to the vision doc as their north-star. ID-023 shipped (SPEC-027); ID-024 / ID-025 remain queued. +- ID-026 / ID-027 / ID-028 came out of the SPEC-027 retro. ID-026 (release-hygiene guard) is a two-cycle recurrence (SPEC-018 + SPEC-027) and relates to the ID-016 / ID-020 guard-promotion theme; it is the highest-signal of the three. ID-027 (validate autonomy-gate lens) would have caught the SPEC-027 review HIGH earlier. ID-028 is tiny-lane polish. ID-026 shipped (SPEC-028); ID-027 / ID-028 remain queued. +- ID-029 / ID-030 came out of the SPEC-028 retro (DEC-003's two inlined check copies drifted three times in one cycle, each caught at a different verification layer). ID-029 (identical-by-contract abstraction carve-out) is the substantive insight and relates to the kit's code-quality rules; ID-030 (worker-prompt symmetry) is tiny-lane polish and relates to ID-028. The third retro action item, resolving the release-state phantom cut (integrate the agents-md -> mid-flight -> release-hygiene stack and tag), is deliberately NOT a backlog row: it is a maintainer integration decision, recorded in the SPEC-028 retro. ## Schema diff --git a/commands/assign.md b/commands/assign.md index 3be3c95..fe9b406 100644 --- a/commands/assign.md +++ b/commands/assign.md @@ -6,23 +6,51 @@ You are a goal dispatcher. Turn a committed backlog item into a runnable goal dr ## Process -### Step 1: Resolve the item +### Step 1: Resolve the argument (two shapes) -Read `$ARGUMENTS` for the `ID-NNN`. Find that row in the `_meta/BACKLOG.md` Active queue (the Schema section there defines the columns). If the id is absent, say so, list the open queue ids, and stop. +`$ARGUMENTS` is one of two shapes. Trim it, then branch: + +- **ID shape**: the trimmed argument matches `^ID-[0-9]+$`. Take the **ID-first path** below (Step 1b onward), unchanged. This is the path that has always existed. +- **Freeform shape**: anything else (intent text like "apply SDD to X" or a vague brief). Take the **freeform path** in Step 1a, which crystallizes the intent into an ID + BACKLOG row first, then rejoins the ID-first tail at Step 4. + +A future third intake shape (e.g. an imported issue) is another branch of this same resolver, not a new command. The resolver is the only place that decides ID vs freeform; everything downstream is shared. + +### Step 1a: Freeform path (delegate, gate, sanitize, allocate, write) + +When the argument is freeform intent (does NOT match `^ID-[0-9]+$`), run these ordered steps. `/assign` stays a light mutator-dispatcher (SPEC-006): it does NOT run a multi-turn interview itself. Two invariants govern this path, and both come before any file is written: + +- **approve-before-allocate**: pause for human approval of the crystallized objective BEFORE allocating an ID. A vague brief never auto-creates a row (DEC-002). +- **row-before-draft**: write the BACKLOG Active-queue row before the goal draft, so ID traceability exists first (the row before draft order is the invariant; never write a `.claude/goals/` draft for an un-rowed intent). + +1. **Delegate crystallize to `/user:think`.** Hand the freeform intent to `/user:think` (the existing idea-griller). It runs the interview and returns a crystallized objective + a lane. `/assign` does NOT embed that interview (DEC-003); it only consumes `/think`'s result. If the intent is too vague to name an outcome, `/think` loops; do not allocate anything until it converges. +2. **Approval gate (approve-before-allocate).** Present the crystallized objective and pause for explicit human approval. Until the human approves, allocate nothing and write nothing. This is the gate that keeps half-baked rows out of the queue. +3. **Dedup by slug.** Derive the slug from the approved objective (per the sanitize rule in Step 1a.4). If a `.claude/goals/.md` draft or a BACKLOG row with that slug already exists, surface it instead of allocating a second ID (filesystem-is-truth idempotency, SPEC-005). On a near-match slug, ask rather than silently merge or duplicate. +4. **Sanitize (DEC-004).** Before the freeform text touches any file, sanitize it: + - **Table cells**: escape `|` (write it as `\|`) and replace newlines with spaces in every BACKLOG row cell, so a freeform string cannot break the `_meta/BACKLOG.md` pipe table. + - **Slug**: reduce the derived slug to `[a-z0-9-]+` only. Lowercase, replace spaces with `-`, then strip `/`, `..`, and anything outside `[a-z0-9-]`, so the slug cannot traverse out of `.claude/goals/`. The kebab convention is unchanged; it is just hardened to `[a-z0-9-]+`. +5. **Atomic allocate + write the row (DEC-005, row-before-draft).** Allocate the next `ID-NNN` by **re-reading the current max ID** in `_meta/BACKLOG.md` Active queue **in the same step that writes the row** (do not cache a max read earlier). The new ID is `max + 1`, zero-padded. Append a sanitized Active-queue row with `Status: queued` and `Source: freeform intake ()`, filling Title / Target artifact / Lane from the crystallized objective + `/think`'s lane. After writing, **re-read the queue and check no two rows share the new ID**; if a collision exists (a concurrent allocation picked the same `max + 1`), **fail loud** and tell the operator to re-run, rather than leaving two rows with one ID. This mirrors the existing SPEC/ADR dup-number guard. +6. **Rejoin the ID-first tail.** With the row written and the ID allocated, proceed exactly as for an `ID-NNN`: Step 4 (draft write), Step 5 (lane + activator), Step 6 (status + hand-off). The freeform path adds no new tail; it reuses the ID path's. + +### Step 1b: Resolve the item (ID-first path) + +Read `$ARGUMENTS` for the `ID-NNN`. Find that row in the `_meta/BACKLOG.md` Active queue (the Schema section there defines the columns). If the id is absent, say so, list the open queue ids, and stop. (An argument that looks like an ID but is not in the queue is a typo'd ID, not a freeform intent: report "unknown id", do NOT silently create a freeform row from it.) ### Step 2: Idempotency check If a `.claude/goals/.md` already exists for this id (one draft per id), re-surface the existing draft instead of creating a duplicate, and skip to Step 5. Mirrors SPEC-005 edge 6: the filesystem is the source of truth. -### Step 3: Goal-craft the breakdown +### Step 3: Project the six-section operating directive + +From the item's Title + Target artifact, craft the goal as a **six-section operating directive**, not a one-line contract. Each section *projects* from an `AGENTS.md` zone or a section of the active spec; this is the projection contract, and it must match `AGENTS.md`'s "How a goal is composed" table (keep the two in sync, or the `AGENTS.md`↔`assign.md` drift failure-mode fires). The mapping is a composition, not 1:1: two of the six sections come from the active spec, not from `AGENTS.md`. -From the item's Title + Target artifact, craft a goal with: -- **Objective**: the outcome that makes the item done, artifact-shaped and verifiable. -- **Scope fence**: the files/dirs in play, plus an explicit `Not:` list of adjacent things to leave alone. -- **Termination-on-blocker**: if blocked, write a named blocker note and stop (no churn). -- **Verification**: the command(s) that prove it done. If the Target artifact is a SPEC with a `## Verification` section, point the goal at that doc (a pointer-`/goal`); otherwise name the real check. +1. **Context-to-read** <- `AGENTS.md` zone 1 ("Read in this order"). Carry the ordered read list (AGENTS.md, CLAUDE.md, the active `docs/specs/SPEC-NNN-.md`, then reference docs) so the loop orients before touching anything. +2. **Constraints** <- the `AGENTS.md` / `CLAUDE.md` rules plus the item's scope fence: the files/dirs in play and an explicit `Not:` list of adjacent things to leave alone. This is where the old scope fence lives now. +3. **Operating rules** <- `AGENTS.md` zone 2 ("Task loop"): size the lane, read the spec + its AC, implement the smallest verifiable increment, verify, commit (one logical change, no spec/ticket IDs in the subject). +4. **Validation loop** <- the active spec's `## Verification`. If the Target artifact is a SPEC, point the goal at that doc's `## Verification` (a pointer-`/goal`); otherwise name the real check command. Do not write "tests pass"; name the command. +5. **Done-when** <- `AGENTS.md` zone 3 ("Done means") + the active spec's `## After state` (the observable bullets). The goal is done only when its AC are met, the check actually ran, review is recorded + a report written, and the final response says what changed and what was not attempted. Quote the spec's `## After state` observable bullets here verbatim as the done-picture. If the spec lacks an `## After state` section, fall back to `AGENTS.md` "Done means" alone. +6. **Pause-if** <- `AGENTS.md` zone 4 ("Pause if"): stop and ask a human (with a named blocker note, no churn) on architecture direction, source-of-truth hierarchy, validation removal, risk-classification change, or privacy/security. This is where the old termination-on-blocker lives now. -If the repo is spec-driven and the lane is normal/full, the goal is **spec-first**: its opening move is the lane's first command (`/user:spec`), not building code. This matches the `goal-craft` skill's spec-driven-repo rule. +If the repo is spec-driven and the lane is normal/full, the directive is **spec-first**: its opening move is the lane's first command (`/user:spec`), not building code. This matches the `goal-craft` skill's spec-driven-repo rule. ### Step 4: Write the draft (the SPEC-005 contract / ADR-0011) @@ -36,7 +64,7 @@ target_spec: status: drafted created: --- - + ``` `.claude/` is gitignored (per-machine drafts). The filesystem (`ls .claude/goals/*.md`) is authoritative; if a `.claude/goals/INDEX.md` cache exists, rebuild its row from the files. Do NOT write `.claude/last-goal.md`: the built-in `/goal` owns that slot (ADR-0011). @@ -61,7 +89,11 @@ created: - **Re-run for the same id**: re-surface the existing draft; do not duplicate or double-advance status (idempotent). - **No activator installed**: the draft still works as a plain file; only one-step activation is lost. - **Queued item with no spec, normal/full lane**: hand off to `/user:spec` first. +- **Freeform intent too vague**: `/user:think` loops until the objective is named; no ID is allocated until the approval gate passes (no half-baked rows). +- **Duplicate freeform intent**: dedup by slug after crystallize; surface the existing row/draft instead of allocating a second ID, and ask on a near-match rather than silently merge. +- **Concurrent freeform allocation**: both sessions re-read max in the write step; the post-write equal-ID collision check fails loud and the operator re-runs. +- **`|` or newline in freeform intent**: sanitized (cells escape `|`, newlines become spaces) before the row is written, so the BACKLOG pipe table stays well-formed. ## What this command does NOT do -It does not execute the task, does not write `.claude/last-goal.md`, and does not hard-gate. It is the mutator that sets up a goal; the lane's commands do the work and `/user:start`/`/user:next` only render. Source: SPEC-006; dispatcher pattern from `commands/next.md` + CCGS `/start`; goal breakdown from the `goal-craft` skill; draft store from ADR-0011. +It does not execute the task, does not write `.claude/last-goal.md`, and does not hard-gate. On the freeform path it does NOT embed a multi-turn interview either: the crystallize step is delegated to `/user:think`, and `/assign` keeps only allocate + route (DEC-003). It is the mutator that sets up a goal; the lane's commands do the work and `/user:start`/`/user:next` only render. Source: SPEC-006; SPEC-026 (the freeform front door + its four invariants: row-before-draft, approve-before-allocate, sanitize, atomic-allocate); dispatcher pattern from `commands/next.md` + CCGS `/start`; goal breakdown from the `goal-craft` skill; draft store from ADR-0011. diff --git a/commands/execute.md b/commands/execute.md index 04a4a52..84b85d8 100644 --- a/commands/execute.md +++ b/commands/execute.md @@ -233,8 +233,8 @@ After all phases complete: - **Worker fails to complete**: Run task-verifier anyway on whatever exists. The verifier determines if partial work is salvageable (FAIL:fixable) or needs human input (FAIL:escalate). - **Tests break during execution**: task-verifier catches this. If fixable, fix-agent handles it. If not, escalate. -- **Spec ambiguity discovered**: Stop and ask user to clarify. Do not guess. Do not dispatch fix-agent for spec problems. -- **Task is too large**: Split it into subtasks, confirm with user, then dispatch. +- **Spec ambiguity discovered**: If it is a genuine contradiction (the spec disagrees with itself), stop and ask the user to clarify. Do not guess. Do not dispatch fix-agent for spec problems. If instead the work reveals scope that must be ADDED now ("also do Y"), that is the declared mid-flight amend path, not an ambiguity: confirm the added scope with the user first (adding scope is not the loop's call), then amend at a checkpoint (append `- [ ]` tasks, record an `## Amendments` entry) and resume with `/user:next` (see WORKFLOW.md "## Mid-flight amend"). +- **Task is too large**: Split it into subtasks. If the split stays within the task's declared scope, confirm with user, then dispatch. If splitting means ADDING scope beyond the spec, confirm the added scope with the user, then route it through the mid-flight amend path (amend at a checkpoint, then resume with `/user:next`; see WORKFLOW.md "## Mid-flight amend"). - **fix-agent reports it cannot fix an issue**: Escalate immediately. Don't retry with the same fix-agent. ## Anti-patterns to avoid @@ -243,6 +243,6 @@ After all phases complete: - Do NOT skip verification. Every task goes through task-verifier, even if the worker says "all criteria met." - Do NOT skip the phase checkpoint. The user must approve before the next phase. - Do NOT auto-fix failing tests without the verification pipeline. -- Do NOT modify the spec without asking. +- Do NOT silently mutate the spec mid-build. An amend is not a silent edit: when the work reveals scope that must be added now, take the declared mid-flight amend path (pause at a task checkpoint, append new `- [ ]` tasks, record an `## Amendments` entry, resume with `/user:next`). See WORKFLOW.md "## Mid-flight amend". A silent rewrite of done (`- [x]`) tasks is still forbidden. - Do NOT retry FAIL:escalate verdicts. They need human judgment by definition. - Do NOT dispatch fix-agent for more than 2 issues at once. If the verifier found 5+, the task needs re-implementation, not patching. Escalate. diff --git a/commands/kit-health.md b/commands/kit-health.md index 667d215..9365292 100644 --- a/commands/kit-health.md +++ b/commands/kit-health.md @@ -78,6 +78,25 @@ echo "Source citations in README: $CREDITS" # 10. TODOs/FIXMEs in hook scripts TODOS=$(grep -r "TODO\|FIXME" ~/.claude/dwarves-kit/hooks/ 2>/dev/null | wc -l | tr -d ' ') echo "TODOs/FIXMEs in hooks: $TODOS" + +# 11. Release hygiene: a phantom version cut (VERSION names an untagged version) +# Repo-scoped: reads the repo's .git + VERSION via the current working dir (git +# tags + CHANGELOG live in the repo, not the installed copy under ~/.claude). The +# guard degrades to a no-op outside a git repo / without VERSION, never errors. +if [ -f VERSION ] && git rev-parse --git-dir >/dev/null 2>&1; then + VER=$(tr -d '[:space:]' < VERSION) + if [ -n "$VER" ] && [ -z "$(git tag -l "v$VER")" ]; then + echo " [WARN] release hygiene: VERSION is $VER but tag v$VER does not exist (phantom cut)" + # Accumulation context: [Unreleased] NON-empty => work piling above an untagged cut (same awk as ship.md, DEC-006). + if [ -f CHANGELOG.md ] && awk '/## \[Unreleased\]/{f=1;next} /^## /{f=0} f && NF{print}' CHANGELOG.md | grep -q .; then + echo " and CHANGELOG [Unreleased] is accumulating above it" + fi + else + echo " release hygiene: ok (v$VER tagged, or clean)" + fi +else + echo " release hygiene: skipped (not in the kit repo / no VERSION)" +fi ``` ### Step 2: Present health report diff --git a/commands/ship.md b/commands/ship.md index 654adeb..110cb91 100644 --- a/commands/ship.md +++ b/commands/ship.md @@ -55,6 +55,28 @@ If a version file exists: If no version file exists: skip this step silently. +### Step 4a: Release-hygiene warn (warn, not block) + +At the version/tag decision, check for a **phantom cut**: `VERSION` names a version that has no matching git tag. Run the check, then REPORT to the maintainer; do NOT block the ship. Mirror the Step 1b "warn, not block" voice: hard blocks stay reserved for the REVIEW.md DO-NOT-SHIP verdict and the safety gates. + +Use this exact check shape (kit-health's check uses the same shape; they must not drift): + +```bash +# Graceful degrade: no VERSION or not in a git repo => silent no-op, never error or block. +if [ -f VERSION ] && git rev-parse --git-dir >/dev/null 2>&1; then + VER=$(tr -d '[:space:]' < VERSION) # strip whitespace so a trailing newline cannot break the pattern + if [ -n "$VER" ] && [ -z "$(git tag -l "v$VER")" ]; then + echo "WARN release-hygiene: phantom cut. VERSION names v$VER but no matching git tag exists." + # Accumulation context: [Unreleased] non-empty => work piling above an untagged cut. + if [ -f CHANGELOG.md ] && awk '/## \[Unreleased\]/{f=1;next} /^## /{f=0} f && NF{print}' CHANGELOG.md | grep -q .; then + echo " work is accumulating above an untagged cut v$VER." + fi + fi +fi +``` + +If the phantom cut fires, surface it as a heads-up at the version step (remember to tag `v$VER`), then continue. During a real release this fires between the version-bump commit and the tag, where the warn is the correct "remember to tag" nudge. Source: SPEC-028 (DEC-001 warn-only, DEC-005 shared shape). + ### Step 5: Generate changelog entry If `CHANGELOG.md` exists (or the project follows a changelog convention): diff --git a/commands/spec.md b/commands/spec.md index f8b442d..8723add 100644 --- a/commands/spec.md +++ b/commands/spec.md @@ -119,6 +119,12 @@ Each task must be atomic: implementable in one session, fits in 50% of a context ### Phase 3: Polish - [ ] TASK-004: [description] — [acceptance criteria] +## After state +The definition-of-done picture. Each bullet is false now and true after, and each is checkable by a human or a command. This feeds `## Acceptance Criteria` below and projects into the goal's `Done-when`. +Rule: observable, not narrated. If a bullet cannot be verified by reading a file, running a command, or seeing a state, it is fluff and gets cut (PHILOSOPHY: every file justifies its existence). Pair the after-state with a "(Today: ...)" current-state note where it sharpens the contrast. +- [ ] [observable end state]. (Today: [current state].) +- [ ] [observable end state, checkable by ``]. + ## Acceptance Criteria (global) - [ ] All tasks pass their individual acceptance criteria - [ ] Tests cover happy path + edge cases listed below @@ -143,6 +149,10 @@ Optional; expected for full-lane specs that touch an external provider, data los ## Decision Log - DEC-001: [decision], [rationale], [alternatives rejected] +## Amendments +Optional; added only when a mid-flight amend happens (like `## Failure modes` / `## Open questions`, never an empty scaffold in a fresh spec). A running provenance log of mid-build scope additions. `WORKFLOW.md` owns the amend rule (when you may amend, the checkpoint guard, resume); this section is just the recorded entry. Entry shape: +- AMEND-NNN: [date] | [what scope was added] | why: [reason] | at [TASK-NNN] checkpoint | new tasks: [TASK-NNN..TASK-NNN] | re-validated: [delta-only (advisory / full lane)] + ## Open questions (none; a /goal loop appends here if it hits a decision this spec does not cover, then stops) ``` diff --git a/docs/ORCHESTRATION.md b/docs/ORCHESTRATION.md new file mode 100644 index 0000000..90c59b4 --- /dev/null +++ b/docs/ORCHESTRATION.md @@ -0,0 +1,408 @@ +# ORCHESTRATION.md: the flow/loop view + +> Reference manual for the dwarves-kit **orchestration layer**: the machinery that +> moves one unit of work from intake to shipped. This is the flow-and-loop view. +> For per-command operator detail read `MANUAL.md`; for the rules contract read +> `WORKFLOW.md`; for component fit read `docs/architecture.md`; for why-decisions +> read `docs/PHILOSOPHY.md` + `docs/decisions/`. This doc cross-links those; it +> does not restate them. + +## At a glance + +| Kind | Count | Members | +|---|---|---| +| Backbone | 1 | the spine (intake -> shipped) | +| Primary intake lanes | 5 | `tiny`, `normal`, `full`, `bug`, `backfill` | +| Bounded loops (engines) | 3 | goal loop, debug loop, execute verification pipeline | +| Alternate / branch flows | 7 | retry, escalate, ambiguous-spec, no-activator, idempotent re-run, DO-NOT-SHIP gate, completeness warn+log | +| Opt-in side-flows | 8 | `/user:design`, `/user:devs-team`, `/user:visual-team`, `/user:ui-design`, `/user:test-plan`, `/user:review-team`, `/user:absorb`, `/user:kit-health` | +| Hard stops (the only blockers) | 4 | safety-gate, push-to-main blocker, anti-rationalization, verification pipeline | + +Everything else **suggests and routes; it does not block**. The four hard stops are +the only places the kit refuses to proceed. Keep that split in mind throughout. + +--- + +## 1. The state model (what the flows move between) + +Three stores. Each flow reads and/or writes these; nothing is re-entered between phases. + +```text + _meta/BACKLOG.md docs/specs/SPEC-NNN-.md .claude/goals/.md + ┌─────────────────┐ ┌──────────────────────────┐ ┌──────────────────────┐ + │ the Active queue│ │ the contract │ │ ephemeral goal drafts│ + │ ID-NNN rows │ ──────▶ │ Status: DRAFT │ ◀──▶ │ (gitignored, │ + │ status: │ assign │ -> VALIDATED │ │ per-machine) │ + │ queued/speccing/│ │ -> SHIPPED │ │ one draft per ID │ + │ validated/ │ │ tasks, AC, Verification, │ └──────────────────────┘ + │ executing/ │ │ After state, Open Qs │ the built-in /goal owns + │ shipped (parked)│ └──────────────────────────┘ .claude/last-goal.md; + └─────────────────┘ the kit NEVER writes it +``` + +**Detector vs mutator** (load-bearing): `/user:start` and `/user:next` only **read and +render** the queue + drafts. `/user:assign` is the **only mutator**: it writes a goal +draft, flips a backlog status, and hands off. No other entry point mutates state on intake. +`/user:assign` accepts either an `ID-NNN` or **freeform intent** (the freeform front door, +SPEC-026): given freeform it delegates the crystallize interview to `/user:think`, then +allocates the ID + BACKLOG row (approve-before-allocate, sanitized) before routing as usual. + +--- + +## 2. The spine (master lifecycle flowchart) + +How a committed backlog item becomes shipped work, end to end. This is the backbone; +every lane is a longer or shorter walk along it. + +```text + session start + │ + ▼ + /user:start ........... DETECT: render the BACKLOG Active queue + active goal drafts + │ (read-only; suggests the next command) + ▼ + /user:assign .. MUTATE: goal-craft a draft (.claude/goals/.md); + │ pick the lane from the item, detect the goal-loop activator, + │ flip status (queued -> speccing | executing), hand off. + │ Does NOT execute. Never writes last-goal.md. + ▼ + ┌─────────────────── pick a lane (Section 3) ───────────────────┐ + │ tiny normal full bug backfill │ + └───────┬──────┬─────────────┬──────────────┬─────────────┬─────┘ + │ │ │ │ │ + │ ▼ ▼ ▼ ▼ + │ /user:spec /user:think /user:debug review code, + │ │ /user:spec │ write AGENTS.md + │ │ /user:spec-validate│ /CLAUDE.md/specs + │ │ │ │ (no app code) + │ ▼ ▼ │ │ + │ /user:execute (verification pipeline, Section 5.3) │ + │ │ │ │ │ + │ ▼ ▼ ▼ │ + │ /user:review /user:review-team (root cause │ + │ │ │ recorded, │ + │ ▼ ▼ fix verified, │ + │ (normal) /user:docs human-confirm)│ + │ │ │ │ + │ │ ▼ │ + └──────┴────────▶ /user:ship ◀────────────────────┘ + │ (ship gate: blocks on DO NOT SHIP; + │ push-to-main blocker; flips spec -> SHIPPED; + │ ID-NNN drops off the queue) + ▼ + /user:retro (full lane) -> docs/retro/RETRO-YYYY-MM-DD-.md +``` + +Opt-in beats (Section 6) slot in along this path: `/user:design` between think and spec; +`/user:devs-team` + `/user:visual-team` before the spec hardens; `/user:test-plan` before +execute; `/user:ui-design` for downstream UI work. + +--- + +## 3. Primary intake lanes (5) + +Pick a lane **before** you start. Smaller work skips ceremony. When in doubt between two, +take the heavier one. + +| # | Lane | Trigger (when) | Path | Stop / exit | +|---|---|---|---|---| +| 1 | `tiny` | typo, copy, comment, one obvious edit | edit -> verify -> done. No spec. | the edit verifies | +| 2 | `normal` | one bounded feature or fix | `/spec` -> `/execute` -> `/review` -> `/ship` | shipped, ID off queue | +| 3 | `full` | touches auth/authz, hooks, data model, data loss, audit/security, an external provider, an API contract, a migration, or weakens validation | `/think` -> `/spec` -> `/spec-validate` -> `/execute` -> `/review-team` -> `/docs` -> `/ship` -> `/retro` | shipped + retro written | +| 4 | `bug` | a defect, regression, or failing test (not a new feature) | `/debug` (root cause first) -> `/review` | root cause recorded, fix verified, human-confirmed | +| 5 | `backfill` | brownfield: adopt the kit onto existing code | review the code, write the operating-layer docs (AGENTS.md / CLAUDE.md / specs). `/spec` optional. | docs written; **no app-behavior change, no app-code edits** | + +```text + is it a defect / regression / failing test ? + │ yes │ no + ▼ ▼ + bug new work on an existing repo + with no operate-layer docs ? + │ yes │ no + ▼ ▼ + backfill how big / how risky ? + ├─ trivial edit ....... tiny + ├─ one bounded change . normal + └─ risk-list match .... full +``` + +The `full` trigger list is a hard tripwire: anything on it uses `full` unless you explicitly +narrow the scope and say why. + +--- + +## 4. The bounded-loop principle + +The kit ships **bounded in-session loops** and declines **unbounded outer loops**. A bounded +loop continues *within* the current session under a model-evaluated stop condition plus the +safety subset; an unbounded loop spawns *new* sessions without one (that is autonomous-runtime +territory: GSD v2 / OMC, out of scope). All three engines below are bounded. + +--- + +## 5. The three bounded loops (engines) + +### 5.1 Goal loop + +A continuation that keeps the current session working a single objective until a verifiable +stop holds. Wired from the backlog by `/user:assign`, activated by whatever loop primitive is +present. + +- **Trigger**: an objective handed to an activator: the built-in `/goal`, the `ralph-loop` + plugin, or the `goal-craft` skill. `/user:assign` writes the draft and surfaces its body; + it does **not** start the loop itself (activator-agnostic). +- **Enforcer**: the **anti-rationalization Stop hook** (blocks premature "done"), plus the + rest of the safety subset (verification pipeline, push-to-main blocker). +- **Stop condition**: the objective's `## Verification` command(s) pass AND the done-definition + holds (AGENTS.md "Done means" + the spec's `## After state`). On a blocker it cannot resolve, + it appends a named note to the spec's `## Open questions` and stops (no churn). +- **Branches**: no-activator (degrades to a plain reusable draft file, Section 7.4); + blocker-hit (write Open-questions note, stop). + +```text + activator starts the objective + │ + ▼ + ┌───▶ do the next increment ──▶ run ## Verification + │ ▲ │ + │ │ pass? │ + │ │ ┌─────────┴─────────┐ + │ │ no │ │ yes + │ │ ▼ ▼ + │ │ anti-rationalization ALL done? ──no──┐ + │ │ blocks "done"; │ yes │ + │ └───── keep working ◀─────────┘ │ + │ │ + │ hit a blocker you can't resolve? ▼ + └── write a note to spec ## Open questions ─────────────▶ STOP +``` + +### 5.2 Debug loop (`/user:debug`, the `bug` lane) + +A systematic four-phase loop under one iron law. Off-cycle: a bug-lane entry point, not a +linear phase. + +- **Trigger**: `/user:debug` on a defect, regression, or failing test. +- **Iron law**: **NO FIX WITHOUT A RECORDED ROOT CAUSE.** Evidence accrues in an append-only + ledger `.claude/debug/.md` whose `## Root cause` heading is the contract. +- **Enforcer**: the **guess-fix guard** (a gated mode of the anti-rationalization hook): it + blocks a fix/done claim while an open debug ledger still has an empty `## Root cause`. Silent + in non-debug sessions. +- **Stop condition**: root cause recorded + fix verified + **human-confirmed**. +- **Branches**: regression -> `git bisect`; failing-test-first -> routed into the execute + verification pipeline (5.3); the **3-fix architecture wall** (after 3 failed fixes, stop and + reconsider the design rather than keep patching). + +```text + /user:debug + │ + ▼ + Phase 1: Root cause ───▶ Phase 2: Pattern ───▶ Phase 3: Hypothesis ───▶ Phase 4: Implementation + │ (ledger (reproduce, (predict, then (apply the fix) + │ ## Root cause) narrow; bisect test the guess) │ + │ if regression) │ ▼ + │ │ verified? ──no──┐ + guess-fix guard: a fix/done claim is BLOCKED │ │ yes │ + while ## Root cause is empty ◀──────────────────────────┘ ▼ │ + human-confirm │ + 3 failed fixes in a row ──▶ STOP: architecture wall (reconsider design) │ │ + ▼ │ + DONE ◀───────┘ (loop Phase 3-4) +``` + +### 5.3 Execute verification pipeline (the build engine) + +The core build loop: `/user:execute` dispatches one worker per task, verifies each in a fresh +context, retries fixable failures, and checks cross-task wiring at the end. Self-reported +"done" from a worker is never proof; the verifier is. + +- **Trigger**: `/user:execute` on a `VALIDATED`/`APPROVED` spec on a feature branch. +- **Enforcer**: the **verification pipeline itself** is a hard stop (it gates each task). +- **Stop condition**: every task PASS **and** the integration-checker PASS (multi-task specs). +- **Branches**: `PASS` (advance), `FAIL:fixable` (retry via fix-agent, **max 2**), `FAIL:escalate` + or retries exhausted (stop -> human). Single-task specs skip the integration check. + +```text + /user:execute (record pre-build base ref) + │ + ▼ + ┌── for each task in phase ──────────────────────────────────────────┐ + │ worker subagent (fresh context) ──▶ task-verifier (read-only) │ + │ │ │ + │ ┌───────────────────┼───────────────────┐ │ + │ PASS FAIL:fixable FAIL:escalate + │ │ │ │ │ + │ │ ▼ │ │ + │ │ fix-agent (scoped) │ │ + │ │ re-verify ▲ │ │ + │ │ retry < 2 ─┘ │ │ + │ │ retries == 2 ──────────────▶│ │ + │ ▼ ▼ │ + │ mark task done ESCALATE to human + └──────────┬─────────────────────────────────────────────────────────┘ + │ all tasks PASS + ▼ + phase checkpoint (human: continue / review / stop) + │ + ▼ + integration-checker (read-only, diffs whole build from base ref) + │ + ┌─────┼───────────────┐ + PASS FAIL:fixable FAIL:escalate + │ │ (fix-agent, │ + │ │ reuse cap) ▼ + ▼ ▼ ESCALATE + build complete ◀── re-check +``` + +### 5.4 Mid-flight amend micro-loop (BUILDING -> checkpoint -> amend -> resume) + +A side excursion off the execute pipeline, not a fourth engine: when work mid-build reveals +scope that must be added now ("also do Y"), the operator amends the active `VALIDATED` spec in +place and resumes, **without** restarting the lane. The rule (the checkpoint guard, the +add-only + frozen-completed-tasks + Status-stays-VALIDATED invariants, resume-via-`/next`) is +canonical in **`WORKFLOW.md` "## Mid-flight amend"**; the state-model row +(`BUILDING -> SPECIFYING -> BUILDING`) lives in `docs/operating-layer-vision.md` §3.3. This +section only draws the loop; it does not restate the four invariants. + +- **Trigger**: operator says "also do Y" / "amend the spec" mid-`/user:execute`. +- **Guard**: amend only at a task checkpoint (the in-flight task verified + committed first, + or none in flight); completed `- [x]` tasks are frozen. +- **Resume**: `/user:next` (picks the next undone `- [ ]` row, skips done rows), **not** a fresh + `/user:execute` (which re-presents the whole plan). See `WORKFLOW.md` for why. + +```text + BUILDING (mid /user:execute, spec is VALIDATED) + │ trigger: "also do Y" + ▼ + reach a task checkpoint ──────────────────────────────┐ + (in-flight task verified + committed; - [x] frozen) │ not at a checkpoint yet? + │ │ finish the in-flight task first + ▼ └──────────────────────────────┘ + SPECIFYING (amend, not restart) + - append new - [ ] TASK rows; delta After-state / AC / Verification + - record an ## Amendments entry + - re-validate the DELTA only (full: /spec-validate; normal: advisory) + │ Status STAYS VALIDATED (no drop to DRAFT) + ▼ + /user:next ──▶ BUILDING (resume; runs only the amended tasks) +``` + +--- + +## 6. Opt-in side-flows (8) + +Advisory, never blocking. They enrich a lane but are not required by any. + +| # | Flow | Trigger | Writes to | Stop | +|---|---|---|---|---| +| 1 | `/user:design` | between `/think` and `/spec`, when the solution needs working out | `docs/specs/DECISION-BRIEF.md` (folded into the spec by `/spec`) | solution agreed per section | +| 2 | `/user:devs-team` | before the spec hardens; 5 engineering lenses | `## Design critique` in the active spec (else the brief) | verdict recorded | +| 3 | `/user:visual-team` | a visual/UI design exists (downstream) | `## Visual critique` in the active spec (else brief, else inline) | verdict recorded | +| 4 | `/user:ui-design` | downstream UI work, after `/design` | `## UI design` in the spec; generates via `frontend-design`; critiques via `/visual-team` | SOLID/RECONSIDER verdict or max-2 revise | +| 5 | `/user:test-plan` | before `/execute`; derive a coverage matrix | `## Test plan` in the spec (consumed by `/execute`) | matrix written | +| 6 | `/user:review-team` | PR-grade review; 3 lenses (security/architecture/test-coverage) in parallel | `REVIEW.md` (+ `TODOS.md`) | SHIP / FIX THEN SHIP / DO NOT SHIP | +| 7 | `/user:absorb` | maintainer-only external-absorption audit | dated report under `docs/absorption/` | proposal-only report (human merge gate) | +| 8 | `/user:kit-health` | maintainer self-assessment vs PHILOSOPHY, before tagging | report (stdout) | assessment rendered | + +All of these write **into the active spec** when the output binds to a spec (replace-not-stack), +so a later reader and an earlier writer never split across two specs. + +--- + +## 7. Alternate / branch flows (7) + +The edges that fire when the happy path does not hold. + +### 7.1 Retry (fixable failure) +A `task-verifier` (or `integration-checker`) `FAIL:fixable` dispatches a scoped **fix-agent**, +then re-verifies. Cap: **2 retries**. Rationale: 1-2 cycles catch import/assertion/off-by-one +bugs; 3+ means a design problem, not a code bug. + +### 7.2 Escalate (unfixable or exhausted) +`FAIL:escalate`, or retries hitting the cap, **stops the loop and hands to the human** with the +full context (task, every verifier report, every fix attempt). Escalate verdicts are never +auto-retried; they need judgment by definition. + +### 7.3 Ambiguous spec +Spec detection is branch-aware. When more than one non-`SHIPPED`/`PARKED` spec is active and the +branch slug does not disambiguate, the detectors emit `spec:ambiguous(...)` and **ask** rather +than silently pick. Resolve by branch or by naming the spec. + +### 7.4 No activator installed +`/user:assign` detects the goal-loop activator (`/goal` -> `ralph-loop` -> `goal-craft`). If none +is installed it **degrades gracefully**: the draft is left as a plain reusable file you paste +wherever. Only one-step activation is lost; the draft still works. + +### 7.5 Idempotent re-run +Re-running `/user:assign` for an ID that already has a `.claude/goals/.md` **re-surfaces the +existing draft** instead of duplicating it or double-advancing status. The filesystem is the +source of truth. + +### 7.6 DO-NOT-SHIP gate +`/user:ship` reads `REVIEW.md` first. `DO NOT SHIP` -> **stop**, fix first. `FIX THEN SHIP` -> +apply fixes, then ship. No `REVIEW.md` -> **warn and ask** (run `/review`, run `/review-team`, +or ship anyway); never silently skipped. + +### 7.7 Completeness warn + log (not a block) +Two self-checks run during Build/Reflect and **warn + log** to +`~/.claude/dwarves-kit/logs/completeness.log` without blocking: **decision-translation** (each +Build-decision is referenced by a task/AC) and **doc-update** (a change touching X moves its +companion docs per the WORKFLOW doc-impact map). `/user:ship` and `/user:retro` review that log +at the gate. Hard blocks stay reserved for the safety subset. + +--- + +## 8. The four hard stops (the only blockers) + +Everything above suggests or warns. These four refuse to proceed, because the cost of the +mistake is irreversible: + +| Hard stop | Fires on | Mechanism | +|---|---|---| +| safety-gate | destructive Bash (`rm -rf`, `DROP TABLE`, `git reset --hard`, `kubectl delete`; build-artifact allowlist exempt) | PreToolUse hook, exit 2 | +| push-to-main blocker | a push to `main`/`master`/protected | PreToolUse hook, exit 2 | +| anti-rationalization | premature "done" claim; phantom-impl stub in the diff; guess-fix while `## Root cause` empty | Stop hook | +| verification pipeline | a task whose acceptance criteria are unmet or whose tests did not actually run | `/execute` gate (worker -> verifier -> fix -> escalate) | + +--- + +## 9. Quick reference + +### 9.1 Trigger -> flow -> stop -> enforcer + +| Trigger | Starts | Stop condition | Enforcer | +|---|---|---|---| +| `/user:start` | render queue + drafts | output rendered | none (detector) | +| `/user:assign ` | goal draft + lane routing (freeform: delegate crystallize to `/user:think`, then allocate ID + BACKLOG row) | draft written, status flipped, handed off | none (mutator; idempotent; freeform gated by approve-before-allocate) | +| `/user:think` | decision brief | brief written (if BUILD) | advisory | +| `/user:spec` | spec scaffold | spec exists, `Status: DRAFT` | spec-drift-guard hook | +| `/user:spec-validate` | 5-lens adversarial review | `Status: VALIDATED` | advisory (full lane) | +| `/user:execute` | verification pipeline | all tasks + integration PASS | verification pipeline (hard) | +| `/user:debug` | 4-phase debug loop | root cause + fix verified + human-confirmed | iron law + guess-fix guard | +| `/user:review[-team]` | review | verdict recorded in `REVIEW.md` | advisory | +| `/user:docs` | doc sync + doc-verifier | docs match code | advisory | +| `/user:ship` | ship pipeline | tagged/PR; spec `SHIPPED`; ID off queue | ship gate + push-to-main blocker (hard) | +| `/user:retro` | retrospective | `docs/retro/RETRO--.md` written | advisory | +| a `/goal` activator | goal loop | `## Verification` passes + done-definition | anti-rationalization (hard) | + +### 9.2 Stop-condition index (where each loop terminates) + +- **Goal loop**: `## Verification` command(s) pass and the done-definition holds; or a blocker + note is written to `## Open questions` and it stops. +- **Debug loop**: `## Root cause` recorded, fix verified, human-confirmed; or the 3-fix + architecture wall halts it for a design rethink. +- **Execute pipeline**: every task PASS and integration-checker PASS; or `FAIL:escalate` / + 2 exhausted retries -> human. +- **ui-design loop**: a SOLID or RECONSIDER verdict, or the max-2 revise cap. +- **Lanes**: `tiny` when the edit verifies; `normal`/`full` when shipped (ID off queue); + `bug` at human-confirm; `backfill` when the docs are written with no app-behavior change. + +--- + +## See also +- `WORKFLOW.md` - the rules contract this manual visualizes (the cycle table, the lane table, the doc-impact map). +- `MANUAL.md` - per-command operator detail (reads/writes/gotchas for each `/user:*`). +- `docs/architecture.md` - component fit and the state model. +- `docs/PHILOSOPHY.md` - why these boundaries exist (the bounded/unbounded loop note, the hard-stop reservation). +- `AGENTS.md` - the tool-agnostic operate-contract the goal loop projects from. diff --git a/docs/PLAYBOOK.md b/docs/PLAYBOOK.md new file mode 100644 index 0000000..74a6e9e --- /dev/null +++ b/docs/PLAYBOOK.md @@ -0,0 +1,329 @@ +# PLAYBOOK.md: operator scenarios (what you say -> what happens) + +> How to drive the kit from natural language. Scenario -> trigger phrase -> +> response -> orchestration hook. This is the interaction view. For the flow/loop +> internals read `docs/ORCHESTRATION.md`; for per-command detail read `MANUAL.md`; +> for the rules contract read `WORKFLOW.md`. +> +> Worked examples use the maintainer's own phrasings; the behavior is operator-generic. + +## The one thing to understand first: three layers, only one is automatic + +Your sentence does not mechanically trigger a flow. There are three layers and only +the first fires on its own: + +| Layer | Fires how | Examples | +|---|---|---| +| **Hooks** | **Automatic**, on Claude Code events | `context-readiness` (SessionStart suggestion), `safety-gate`, `anti-rationalization`, `spec-drift-guard`, `push-to-main` | +| **`/user:*` commands** | **Invoked** - you type `/user:x`, OR Claude reads your intent and runs it | `/user:start`, `/user:assign`, `/user:spec`, `/user:execute`, `/user:ship` | +| **Skills** | **Invoked** - Claude recognizes the situation and loads the skill | `goal-craft`, `superpowers:brainstorming`, `content-spec` | + +So when you say "apply SDD," no hook fires on the word "SDD." What happens is **Claude +interprets** the phrase and **invokes** the right command/skill. The kit's hooks then act +as guardrails around whatever runs. Keep this split in mind for every scenario below. + +A second load-bearing fact: the kit's orchestration is **BACKLOG-ID-first**. `/user:assign` +takes an `ID-NNN` **or** freeform intent ("apply SDD to this feature", a vague brief). When you +hand it freeform, `/user:assign` runs the freeform front door natively (Section 8): it delegates +the interview to `/user:think`, waits for your approval, then allocates the ID + BACKLOG row +before routing into the lane. The ID stays canonical; it is just minted on the fly. This shipped +in SPEC-026 / ID-022 (Section 11). + +--- + +## Scenario 1: session start, "what's next / what's left" + +**Context.** Fresh session, you want orientation before doing anything. + +**What you say.** "what's next", "what's left to do", "where were we", or `/user:start` +(`--brief` for one line, `--full` for the task checklist + recent commits). + +**Automatic vs interpreted.** +- *Automatic*: the `context-readiness` hook already injected a one-line `next:` suggestion into + Claude's context when the session started. You may see Claude reference it unprompted. +- *Interpreted*: Claude reads "what's next" and runs the `/user:start` behavior (a **detector**): + render the `_meta/BACKLOG.md` Active queue and the active `.claude/goals/` drafts. Read-only. + +**Resulting flow.** Detector only, no mutation: +```text + "what's next" -> /user:start -> renders: + - BACKLOG Active queue (open ID-NNN + status: queued/speccing/validated/executing) + - active goal drafts in .claude/goals/ + - the suggested next command +``` + +**Decision / approval points.** None (nothing is changed). You then choose the next move: +`/user:assign ID-NNN` to start an item, or `/user:next` to pick up the next undone task of the +already-active spec. + +**How to continue.** Say `assign ID-007` (start that item) or `next` (continue the active spec). +If the queue is empty or you have a new idea, jump to Scenario 2 or 5. + +--- + +## Scenario 2: mid-brainstorm, "apply SDD framework on this feature X" + +**Context.** You are brainstorming a feature. There is **no BACKLOG ID** for it yet. + +**What you say.** "apply SDD to feature X", "let's spec this", "run the spec-driven flow on X". + +**Automatic vs interpreted.** +- *Not a keyword trigger.* No hook watches for "SDD". Nothing auto-fires. +- *Interpreted*: Claude maps "SDD" to **the spec-driven lane** (normal or full) and proposes it. + "SDD" is a keyword **for Claude to interpret**, not a mechanical trigger. + +**Resulting flow.** Because there is no ID, Claude invokes `/user:assign` with the freeform +intent. `/user:assign` runs the **native freeform front door** (Section 8): it delegates the +interview to `/user:think`, then allocates the ID + BACKLOG row, then routes into the lane. By +default Claude does **not** auto-run the whole flow; it sets up and starts the first step, +checking the lane + scope with you: +```text + "apply SDD to X" + -> Claude picks the lane (normal vs full) and confirms scope with you + -> /user:assign "apply SDD to X" (freeform front door, Section 8): + delegate crystallize to /user:think -> [you approve] -> allocate ID + BACKLOG row + -> /user:spec (-> /user:spec-validate on full) + -> /user:execute (verification pipeline) + -> /user:review[-team] -> /user:docs -> /user:ship -> /user:retro +``` + +**Decision / approval points.** Lane + scope confirmation before the spec; the approve-before-allocate +gate inside the freeform front door; spec approval; the spec-validate verdict (full lane); execute +phase checkpoints. Claude stops at each unless you pre-authorize autonomy (Scenario 3). + +**Honest caveat.** "Apply SDD" does not auto-launch the full orchestration off a keyword. Claude +still **interprets** the phrase and **invokes** `/user:assign` with it; the front door is a real +invoked command (SPEC-026, Section 11), not an auto-firing keyword. From there the hooks act as +guardrails. + +--- + +## Scenario 3: "run the full flow, do not interrupt, your call" + +**Context.** You trust the call and want autonomy end to end. + +**What you say (pick the autonomy level explicitly).** +- "run the full lane autonomously; only stop at hard stops" -> autonomous up to the outward-facing step. +- "run it all the way to a PR, your call" -> fully autonomous to merge-ready (push + PR included). +- Or set a goal loop: `/goal ` then "run it" (the bounded in-session loop). + +**Automatic vs interpreted.** +- *Interpreted*: Claude drives the lane end to end without pausing at the **advisory** checkpoints. +- *Automatic*: in a `/goal` loop the **anti-rationalization Stop hook** keeps the session working + until the stop condition holds; the **4 hard stops** still gate every step. + +**Resulting flow.** +```text + /spec -> /spec-validate -> /execute (auto worker->verifier->fix<=2->integration; escalate on fail) + -> /review[-team] -> /docs -> /ship + (advisory phase checkpoints are skipped; the loop runs continuously) +``` + +**Where it STILL stops (always, even autonomous).** +- The 4 hard stops: `safety-gate` (destructive Bash), `push-to-main` blocker, + `anti-rationalization` (premature/false "done"), the verification pipeline (a task that fails). +- Outward-facing irreversible steps: the push and the PR. Claude confirms these **unless** you + said "all the way to a PR." That is the one place "your call" still asks once, by policy. + +**Decision / approval points.** Only the above. Everything advisory is auto-advanced. + +**The exact phrase to grant maximum autonomy:** *"Run the full lane autonomously, including the +push and PR; only stop at the safety hard-stops or a real blocker."* + +--- + +## Scenario 4: an iteration-heavy phase (discuss solution, revisit design) + +**Context.** You want to iterate on one phase (usually solution design) before committing to a spec. +This is the **opposite** of Scenario 3: maximal checkpoints, no autonomy. + +**What you say.** "let's discuss the solution", "iterate on the design", "get back on the design +for X", "explore approaches for X", or `/user:design`. + +**Automatic vs interpreted.** +- *Interpreted*: Claude invokes `/user:design` (the opt-in interactive solution-design beat) and/or + `/user:devs-team` (5-lens engineering critique). For open-ended exploration, `superpowers:brainstorming`. + +**Resulting flow (a human-in-the-loop loop).** +```text + /user:think (optional, if the idea needs challenging: 6 forcing questions) + -> /user:design ── proposes 2-3 approaches, ONE question at a time, + │ holds for your approval PER SECTION, appends the + │ agreed Solution to docs/specs/DECISION-BRIEF.md + │ ◀──┐ + └───────────────────────────────────────────────────────┘ + -> (optional) /user:devs-team ── critique appended to the spec/brief + -> /user:spec ── folds the DECISION-BRIEF Solution into the spec +``` + +**Stop condition.** The loop continues until **you approve** the solution section by section. +It never auto-advances; every section pauses for you. `/user:design` is explicitly the +"ran without my feedback" antidote. + +**Decision / approval points.** Per section in `/design`; the critique verdict (SOLID / REVISE / +RECONSIDER); spec approval. To leave the loop: "the design is good, write the spec." + +--- + +## Scenario 5: a vague / ambiguous goal + +**Context.** You have a fuzzy idea, not yet crystallized into something buildable. + +**What you say.** "I have a rough idea about X", "help me figure out what to build for X", +"here's a vague brief: ...", or `/goal ` (then `goal-craft` sharpens it). + +**Automatic vs interpreted.** +- *Interpreted*: Claude **interviews / grills** you. There is **no "goal-griller" skill**; the + grilling is `/user:think` (6 forcing questions) and/or `superpowers:brainstorming` (intent + + requirements exploration). The `goal-craft` skill **sharpens** a fuzzy intent into an + outcome-shaped `/goal` (verification, scope fence, termination-on-blocker). + +**Resulting flow.** +```text + vague brief + -> /user:assign "" (native freeform front door, Section 8): + delegate crystallize to /user:think and/or superpowers:brainstorming + (challenge the idea, surface requirements, name the real outcome) + -> crystallize into a clear objective + -> [YOU APPROVE the crystallized objective] (approve-before-allocate gate) + -> allocate ID + BACKLOG row -> route into the lane (Scenario 2's flow) +``` + +**Will Claude auto-hook it into the SDD orchestration after you approve?** Yes, **on your "go"** - +`/user:assign`'s freeform path allocates the ID + BACKLOG row and starts the lane's first command. +This is the kit's native front door now, not Claude bridging by hand; the front door is still an +**invoked command** (you, or Claude on your behalf, invoke `/user:assign`), not an auto-firing +keyword. Claude will ask **how autonomous** to be from there (ties to Scenario 3). + +**Decision / approval points.** Claude will **not** write the spec or start the lane until you +approve the crystallized objective. That approval gate is deliberate: a vague brief turned +straight into a spec is how scope drift starts. + +--- + +## Scenario 7: mid-build, "also do Y" (a mid-flight scope change) + +**Context.** You are mid-`/user:execute` on a `VALIDATED` spec (state BUILDING). Partway through, +the work reveals scope that must be added now. You do **not** want to restart the lane or throw +away the tasks already done. + +**What you say.** "also do Y", "while you're in here, add Z", "amend the spec to cover Y". + +**Automatic vs interpreted.** +- *Not a keyword trigger.* No hook watches for "also do". Nothing auto-fires. +- *Interpreted*: Claude recognizes this as a **mid-flight amend** and runs the declared + amend micro-loop (BUILDING -> SPECIFYING -> BUILDING) instead of silently editing the spec or + starting over. The full rule (when you may amend, the checkpoint guard, the recorded entry, + how to resume) is canonical in **`WORKFLOW.md` "## Mid-flight amend"**; this card is only the + what-you-say -> what-happens projection of it. + +**Resulting flow.** Amend in place at a checkpoint, then resume; the spec stays `VALIDATED`: +```text + "also do Y" (mid /user:execute, spec is VALIDATED) + -> reach a task checkpoint (finish + verify + commit the in-flight task first) + -> amend the spec: append new - [ ] TASK rows + delta the After-state / AC / Verification + (completed - [x] tasks are NOT touched), record an ## Amendments entry + -> re-validate the DELTA only (full lane: /spec-validate on the new tasks; normal: advisory) + -> /user:next resumes, picking the next undone - [ ] task (skips the done rows) +``` + +**Decision / approval points.** You confirm the added scope before the amend lands; the +delta-only re-validation verdict (full lane). The Status never drops to `DRAFT` (that would be a +lane restart), and an amend only **adds** scope: rewriting an already-done task's contract is a +heavier re-open decision, not an amend. See `WORKFLOW.md` "## Mid-flight amend" for the four +invariants in full. + +**How to continue.** Say `next` to resume on the amended tasks. To add yet more scope later, +repeat: each amend appends a fresh `## Amendments` line. To leave the build instead, `ship` it. + +--- + +## 8. The freeform -> ID front door (what `/user:assign` does internally) + +Scenarios 2 and 5 are freeform (no ID). The orchestration is ID-first, so `/user:assign` mints the +ID for you. Hand it freeform intent instead of an `ID-NNN` and its freeform path runs: + +```text + /user:assign "" + 1. delegate crystallize -> /user:think (the idea-griller) runs the interview and + returns a crystallized objective + a lane. /assign does NOT + embed the interview; it consumes /think's result. + 2. approve -> pause for your approval of the crystallized objective + (approve-before-allocate: a vague brief never auto-creates a row) + 3. sanitize + allocate -> sanitize the intent (escape `|`/newlines for the table cells; + reduce the slug to [a-z0-9-]+), re-read the current max ID and + write the next ID-NNN row into _meta/BACKLOG.md Active queue in + the same step (Title, Source: freeform intake (date), Target + artifact, Lane, Status: queued), with a loud equal-ID collision + check (atomic-allocate) + 4. rejoin the ID tail -> write the goal draft, pick the lane, route (Scenario 2's flow), + exactly as for an ID-NNN argument +``` + +This is the **native freeform front door** (`/user:assign` runs it; you do not bookkeep by hand). +It preserves ID-first traceability: even an ad-hoc idea gets a BACKLOG row and an ID before any +draft is written, so nothing ships untracked. It is still an **invoked command**, not an +auto-firing keyword: Claude interprets "apply SDD to X" and invokes `/user:assign` with it; the +kit does not watch for the phrase. Shipped in SPEC-026 / ID-022 (Section 11). + +--- + +## 9. Pre-authorization phrases (autonomy dial) + +How much to say to set the autonomy level. Say one of these and Claude calibrates: + +| You say | Autonomy | Claude stops at | +|---|---|---| +| "propose it, don't run anything" | none | after planning; waits for go | +| "run it, check with me at each phase" | low (default) | every advisory phase checkpoint | +| "run the lane, only stop at hard stops" | high | the 4 hard stops + the push/PR (outward-facing) | +| "run it all the way to a PR, your call" | max | only the 4 hard stops + a real blocker | + +The 4 hard stops (`safety-gate`, `push-to-main`, `anti-rationalization`, the verification pipeline) +are **never** waived by any autonomy level. + +--- + +## 10. Cheat-sheet: what you say -> what happens + +| You say | Claude invokes | Fires automatically | Stops at | +|---|---|---|---| +| "what's next / what's left" | `/user:start` (detector) | context-readiness suggestion | nothing (read-only) | +| "assign ID-007" / "start ID-007" | `/user:assign ID-007` | (none) | hands off to the lane | +| "apply SDD to X" (no ID) | `/user:assign ""` (freeform front door) -> `/user:spec` lane | spec-drift-guard once a spec exists | approve-before-allocate gate, lane + scope confirm, then per-phase | +| "discuss / iterate the design" | `/user:design` (+ `/user:devs-team`) | (none) | every section (human-in-loop) | +| "vague idea about X" | `/user:think` / brainstorming | (none) | your approval of the objective | +| "run the full lane, your call" | the lane, autonomously | anti-rationalization (in a /goal loop) | hard stops + push/PR | +| "fix this bug / it regressed" | `/user:debug` (bug lane) | guess-fix guard | root cause + human-confirm | +| "review this" / "ship it" | `/user:review[-team]` / `/user:ship` | ship gate, push-to-main | DO-NOT-SHIP verdict; the push/PR | + +--- + +## 11. The freeform front door (shipped) + +Scenarios 2 and 5 used to expose a real gap: freeform intent ("apply SDD to X", a vague brief) had +no auto-path, so Claude bridged it by hand every run. SPEC-024 deferred the "freeform griller entry" +to keep BACKLOG-ID-first canonical; SPEC-026 / ID-022 then closed the gap and **shipped**. + +**Shipped**: `/user:assign` (the one mutator) now accepts **freeform intent** in addition to +`ID-NNN`. The freeform path delegates crystallization to `/user:think`, pauses for your approval, +sanitizes the input, atomically allocates the next ID, writes the BACKLOG row, then proceeds +exactly as ID-first, so "apply SDD to X" is a genuine one-shot front door without losing ID +traceability. + +- Spec: `docs/specs/SPEC-026-freeform-front-door.md` (VALIDATED, shipped). +- Backlog: ID-022. +- Command: `commands/assign.md` (the resolver + freeform path). + +The four invariants the front door upholds: delegate-to-`/user:think`, approve-before-allocate, +sanitize, atomic-allocate. The native path is Section 8. It stays an invoked command, not an +auto-firing keyword. + +--- + +## See also +- `docs/operating-layer-vision.md` - the design-first vision + the SDLC state machine this playbook projects (the formal model behind these scenarios). +- `docs/ORCHESTRATION.md` - the flow/loop view (lanes, loops, triggers, stop conditions, ASCII diagrams). +- `WORKFLOW.md` - the rules contract (the cycle, the lanes, the gates). +- `MANUAL.md` - per-command operator detail. +- `AGENTS.md` - the operate-contract the goal loop projects from. diff --git a/docs/architecture.md b/docs/architecture.md index fba6b98..97aeb79 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -18,7 +18,7 @@ The kit is intentionally flat. Component dirs sit at the top of the repo, not ne ## Data flow through `docs/specs/SPEC-NNN-.md` -> The imperative companion to the descriptive map below is [`WORKFLOW.md`](../WORKFLOW.md) (repo root): the same lifecycle phrased as the contract an agent follows, with risk-tier lanes and the gate at each boundary. +> The front door to all of this is [`AGENTS.md`](../AGENTS.md) (repo root): the tool-agnostic operate-contract that any runtime reads first; `CLAUDE.md` and `WORKFLOW.md` point at it rather than restate it. The imperative companion to the descriptive map below is [`WORKFLOW.md`](../WORKFLOW.md) (repo root): the same lifecycle phrased as the contract an agent follows, with risk-tier lanes (including the `backfill` brownfield lane: review an existing codebase and write the operating-layer docs, doc-output only, no app-behavior change) and the gate at each boundary. `docs/specs/SPEC-NNN-.md` is the shared contract for the full lifecycle. It is the single source of truth that crosses command boundaries: diff --git a/docs/decisions/0013-agents-md-operating-layer.md b/docs/decisions/0013-agents-md-operating-layer.md index d870b75..2f565c9 100644 --- a/docs/decisions/0013-agents-md-operating-layer.md +++ b/docs/decisions/0013-agents-md-operating-layer.md @@ -1,6 +1,8 @@ # ADR-0013: AGENTS.md as the tool-agnostic operating-layer entrypoint -## Status: proposed (2026-05-21). +## Status: accepted (2026-05-21). Scope approved 2026-05-21; ships as backlog item ID-015 / SPEC-024 (full lane, all six decisions). Entry stays BACKLOG-ID-first; the freeform "griller" entry (a casual intent with no ID) is deferred per that scope call. + +**Addendum (2026-05-21):** SPEC-024 scope also adds an observable **After-state** section: a definition-of-done picture made of statements that are false-now / true-after and each checkable by a human or command. It lands as a `## After state` spec-template section (feeding `## Acceptance Criteria`) and is projected into the six-section goal's `Done-when`. The matching anatomy piece is already added to the personal `goal-craft` skill so it applies beyond this repo. Load-bearing rule: observable, not narrated, or it is fluff and gets cut (PHILOSOPHY: every file justifies itself). ## Context A study of `hoangnb24/harness-experimental` (OpenAI "harness engineering" framing) showed an operator writing a rich, multi-section `/goal` (Context to read first / Constraints / Operating rules / Validation loop / Done when / Pause if) and running it reliably against a brownfield repo. That altitude is not prompt skill. Each section is a pointer to a *named artifact* the harness installs into the consuming repo: an ordered source-of-truth list, a feature-intake/risk doc, a test matrix, a done-definition, and an "ask before" list, all anchored by a single agent entrypoint, `AGENTS.md`. The operator references structure the repo already carries; they do not invent it in the prompt. diff --git a/docs/operating-layer-vision.md b/docs/operating-layer-vision.md new file mode 100644 index 0000000..533d7f5 --- /dev/null +++ b/docs/operating-layer-vision.md @@ -0,0 +1,198 @@ +# Operating-layer vision + SDLC state machine + +> Design-first north-star for the kit's natural-language operating layer. Vision and +> model, not implementation. The implementing specs (SPEC-026 +) trace here. Operator +> behavior derives from this; `docs/PLAYBOOK.md` is the operator-facing projection of +> the state machine below, `docs/ORCHESTRATION.md` is the flow/loop view, `WORKFLOW.md` +> is the rules contract, `docs/PHILOSOPHY.md` is the why. + +## 1. Vision + +Today the kit is driven through an **interpret-and-bridge** layer: the operator types +intent in chat, Claude maps it to a `/user:*` command or skill, and the hooks act as +guardrails. That works, but the operator cannot *see* the machine: there is no explicit +notion of "what state am I in, what transitions are available from here, what guards each +one, and how do I trigger it." The mapping lives only in Claude's interpretation, so it is +inconsistent run to run and invisible to the operator. + +**The vision:** make the operating layer a **legible state machine** the operator can drive +from natural language. At any moment the operator (and Claude) can answer four questions +without guessing: + +1. **Where am I?** (the current state) +2. **Where can I go?** (the available transitions) +3. **What does each cost / require?** (the guard + the stop condition) +4. **How do I trigger it?** (the phrase or command) + +The machine does not replace the interpret layer; it gives that layer a **declared model** +to interpret against, so the same intent produces the same transition every time, and the +operator can learn the map instead of re-deriving it. + +## 2. Principles (inherited + new) + +Inherited from `docs/PHILOSOPHY.md` and `WORKFLOW.md`: +- **Guardrails over guidance.** Transitions are guarded by hooks where the cost is irreversible; everything else suggests and routes. +- **BACKLOG-ID-first.** Every unit of work has an ID before it ships; freeform intent is bridged to an ID (SPEC-026 makes that bridge native). +- **Bounded loops.** Sub-machines (build, debug) terminate on a model-evaluated stop, never an unbounded outer driver. +- **Detector vs mutator.** Reading state never changes it; exactly one mutator advances it. + +New UX invariants this vision adds: +- **State is always answerable.** Claude can always name the current state and the legal next transitions. +- **Guards are explicit, not implicit.** A transition that cannot fire says *why* (the guard that blocks it), never silently stalls. +- **The 4 hard stops are transition guards, not surprises.** They are drawn into the machine, so an operator running autonomously knows exactly the edges that will pause for them. + +## 3. The SDLC state machine + +### 3.1 States + +| State | Meaning | Entry | Exit | +|---|---|---|---| +| `IDLE` | no active unit of work | session start; an item shipped/abandoned | intake | +| `TRIAGING` | intake: intent -> lane + (eventually) an ID | `/assign`, `/think`, "apply SDD", a vague brief | lane chosen | +| `DESIGNING` | solution exploration (iterative) | full lane, or "let's design" | solution approved | +| `SPECIFYING` | the spec is being written | `/spec` | spec `DRAFT` exists | +| `VALIDATING` | adversarial spec review | `/spec-validate` | `VALIDATED` or NEEDS REVISION | +| `BUILDING` | execution sub-machine (worker -> verifier -> fix -> integration) | `/execute`, `/next` | all tasks + integration PASS | +| `REVIEWING` | code review | `/review`, `/review-team` | verdict recorded | +| `DOCUMENTING` | doc sync + doc-verifier | `/docs` | docs match code | +| `SHIPPING` | ship pipeline | `/ship` | tagged/PR; spec `SHIPPED` | +| `REFLECTING` | retrospective | `/retro` | retro written | +| `DEBUGGING` | off-cycle debug sub-machine (iron law) | `/debug` | root cause + fix + human-confirm | +| `BLOCKED` | meta-state: parked, awaiting a human | "park", "I'm stuck", a hard stop, escalate | unblock / abandon | +| `SHIPPED` | terminal: the item is done | `/ship` completes | (re-open -> TRIAGING) | +| `ABANDONED` | terminal: the item is dropped | "kill it" | none | + +### 3.2 Master diagram + +```text + ┌────────────────────────── re-open ("follow-up") ──────────────────────────┐ + ▼ │ + ┌──────┐ intake ┌──────────┐ lane=full ┌───────────┐ approved ┌────────────┐ │ + │ IDLE │ ──────────▶ │ TRIAGING │ ─────────────▶ │ DESIGNING │ ──────────▶ │ SPECIFYING │ │ + └──────┘ └────┬─────┘ │ ⇄ iterate │ └─────┬──────┘ │ + ▲ │ lane=normal └───────────┘ │ DRAFT │ + │ shipped │ (skip design) ▼ │ + │ ├──────────────────────────────────────────────▶ VALIDATING │ + │ │ lane=tiny: edit->verify->done (no spec) │ VALIDATED │ + │ │ lane=bug ─────────────▶ DEBUGGING │ ▲ NEEDS │ + │ │ lane=backfill: docs only, no app code │ │ REVISION │ + │ ▼ ▼ │ │ + │ ┌──────────────────────────────────────────────────────────▶ BUILDING ────────────┘ + │ │ guard: all tasks PASS + integration PASS │ ⇄ retry (fix<=2) + │ │ guard (hard): verification pipeline, anti-rationalization │ escalate + │ ▼ ▼ + │ REVIEWING ◀── FIX THEN SHIP / DO NOT SHIP (loop back to SPECIFYING/BUILDING) + │ │ SHIP / fixes applied + │ ▼ + │ DOCUMENTING ──▶ SHIPPING ──▶ REFLECTING ──▶ (SHIPPED) ──▶ IDLE + │ │ guard (hard): DO-NOT-SHIP verdict, push-to-main blocker + │ │ + (any state) ──"park"/"stuck"/escalate──▶ BLOCKED ──resume──▶ (prior state) + (any state) ──"kill it"──▶ ABANDONED (terminal) + (any state) ──bug found──▶ DEBUGGING ──root cause+fix+confirm──▶ (prior state) +``` + +### 3.3 Transition table (the contract) + +| From | Trigger (phrase / command) | Guard | To | +|---|---|---|---| +| IDLE | "what's next" then pick; `/assign ID`; "apply SDD X"; vague brief | none | TRIAGING | +| TRIAGING | lane = full | scope confirmed | DESIGNING | +| TRIAGING | lane = normal | scope confirmed | SPECIFYING | +| TRIAGING | lane = tiny | trivial edit | BUILDING (no spec) | +| TRIAGING | lane = bug | a defect | DEBUGGING | +| DESIGNING | "iterate", redirect | per-section approval pending | DESIGNING | +| DESIGNING | "design is good, write the spec" | solution approved | SPECIFYING | +| SPECIFYING | `/spec` done | spec `DRAFT` exists | VALIDATING (full) / BUILDING (normal) | +| VALIDATING | `/spec-validate` verdict | VALIDATED | BUILDING | +| VALIDATING | NEEDS REVISION | revisions required | SPECIFYING | +| BUILDING | task FAIL:fixable | retries < 2 | BUILDING (fix-agent) | +| BUILDING | task FAIL:escalate / retries == 2 | unfixable | BLOCKED | +| BUILDING | "also do Y" / "amend the spec" | at a task checkpoint; completed tasks frozen; Status stays VALIDATED | SPECIFYING (amend, not restart) | +| SPECIFYING | resume via `/next` | amend recorded | BUILDING (resume) | +| BUILDING | all tasks done | **all PASS + integration PASS** (hard) | REVIEWING | +| REVIEWING | verdict SHIP / FIX-applied | not DO-NOT-SHIP | DOCUMENTING | +| REVIEWING | FIX THEN SHIP / DO NOT SHIP | findings open | SPECIFYING / BUILDING | +| DOCUMENTING | `/docs` done | doc-verifier PASS | SHIPPING | +| SHIPPING | `/ship` | **not DO-NOT-SHIP, not push-to-main** (hard) | REFLECTING | +| REFLECTING | `/retro` done | retro written | SHIPPED -> IDLE | +| any | "park" / "I'm stuck" | a blocker exists | BLOCKED | +| any | "kill it, not worth it" | operator confirms | ABANDONED | +| any | a bug surfaces | a defect | DEBUGGING | +| SHIPPED | "the shipped X needs a follow-up" | none | TRIAGING (new spec) | + +### 3.4 Sub-machines + +- **BUILDING** expands to: `worker -> task-verifier -> {PASS | FAIL:fixable -> fix-agent (<=2) | FAIL:escalate} -> integration-checker`. See `docs/ORCHESTRATION.md` 5.3. +- **DEBUGGING** expands to: `Phase 1 Root cause -> Phase 2 Pattern -> Phase 3 Hypothesis -> Phase 4 Implementation`, under the iron law (no fix without a recorded root cause), guarded by the guess-fix guard. See `docs/ORCHESTRATION.md` 5.2. + +### 3.5 Hard stops as guards (the only blockers) + +| Hard stop | Guards the transition | Effect | +|---|---|---| +| safety-gate | any transition running destructive Bash | blocks the command | +| push-to-main | SHIPPING -> (the push) | blocks the push | +| anti-rationalization | BUILDING -> REVIEWING; any -> "done" | blocks premature/false done | +| verification pipeline | BUILDING -> REVIEWING | blocks if a task fails | + +Everything else is advisory: it suggests the transition, it does not block it. + +## 4. Scenario catalog (all 15, mapped to transitions) + +The five from the original playbook, plus ten that complete the SDLC. Each is "trigger -> from-state -> guard -> to-state". + +| # | Scenario | Trigger | From -> To | Notes | +|---|---|---|---|---| +| 1 | What's next / left | "what's next" | IDLE -> IDLE (detector) | renders queue; no transition until you pick | +| 2 | Apply SDD to a feature | "apply SDD to X" | IDLE -> TRIAGING -> SPECIFYING | freeform; bridged to an ID (SPEC-026) | +| 3 | Autonomous full flow | "run the lane, your call" | TRIAGING -> ... -> SHIPPED | advisory checkpoints skipped; hard stops remain | +| 4 | Iterate the design | "discuss / revisit design" | DESIGNING ⇄ DESIGNING | human-in-loop, per-section | +| 5 | Vague brief | "rough idea about X" | IDLE -> TRIAGING | interview (`/think` + brainstorming) before any row | +| 6 | Resume after interruption | "where were we" | (any) -> same | session-state restore + `/start`; mostly supported | +| 7 | Mid-flight scope change | "also do Y" | BUILDING -> SPECIFYING | **gap**: amend the active spec mid-build | +| 8 | Blocked / park | "park this", "I'm stuck" | (any) -> BLOCKED | supported (parked status + Open questions) | +| 9 | Context switch across specs | "switch to the other feature" | (any) -> (other worktree) | **gap**: worktree-per-spec is the model, no switch command | +| 10 | Urgent hotfix | "prod is down, fix X now" | IDLE -> DEBUGGING | bug lane exists; **minor gap**: a declared fast path | +| 11 | Review someone else's work | "review this PR" | IDLE -> REVIEWING | supported (`/review` on any diff); note the base ref | +| 12 | Abandon | "kill this" | (any) -> ABANDONED | **minor gap**: no explicit abandon terminal (vs park) | +| 13 | Re-open shipped | "follow-up on shipped X" | SHIPPED -> TRIAGING | **gap**: convention for a follow-up spec | +| 14 | Status during a run | "how's it going" | (running) -> same | report progress; no state change | +| 15 | Knowledge capture | "save this learning" | (any) -> same | side-effect (`/learned` + skills); no state change | + +## 5. Gap analysis (what has no clean path today -> implementing specs) + +Most transitions already have a path. The real gaps, in priority order: + +| Gap | Scenario | Why it is a gap | Proposed | +|---|---|---|---| +| Freeform front door | 2, 5 | `/assign` is ID-only; freeform is bridged by hand | **SPEC-026 / ID-022** (drafted) | +| Mid-flight spec amend | 7 | no path to amend a `VALIDATED`/building spec without restarting | **SPEC-027 / ID-023** (validated) | +| Context switch across specs | 9 | worktree-per-spec is the model but no switch affordance | **ID-024** (new) | +| Re-open shipped | 13 | a `SHIPPED` spec has no follow-up convention | **ID-025** (new) | +| Abandon terminal | 12 | only `parked` exists; no explicit drop | folded into ID-024 (state hygiene) or a tiny lane | + +Scenarios 6, 8, 10, 11, 14, 15 are supported or near-supported today; the doc records them so the machine is complete, no new spec required. + +## 6. How this gets built (the kickoff path, design-first) + +The kit dogfoods itself. The order: + +```text + 1. THIS doc (vision + state machine) <- design-first, done now + 2. /user:assign ID-022 -> SPEC-026 (freeform front door): /spec-validate (drafted) -> /execute + 3. /user:assign ID-023 -> mid-flight spec amend (spec it, then build) + 4. /user:assign ID-024 -> context-switch affordance (+ abandon terminal) + 5. /user:assign ID-025 -> re-open-shipped convention + 6. align docs/PLAYBOOK.md to the shipped state machine (regenerate the scenario cards from it) +``` + +Each row runs the normal/full lane (Section 3). The state machine in this doc is the +acceptance reference: a transition is "done" when an operator can trigger it by phrase and +the guard/stop behaves as the table says. + +## See also +- `docs/PLAYBOOK.md` - the operator-facing projection (what you say -> what happens). +- `docs/ORCHESTRATION.md` - the flow/loop view (the sub-machines in detail). +- `WORKFLOW.md` - the rules contract. +- `docs/PHILOSOPHY.md` - the why behind the guards and the bounded loops. +- `docs/specs/SPEC-026-freeform-front-door.md` - the first implementing spec. diff --git a/docs/retro/RETRO-2026-05-21-agents-md-operating-layer.md b/docs/retro/RETRO-2026-05-21-agents-md-operating-layer.md new file mode 100644 index 0000000..1e4b4b5 --- /dev/null +++ b/docs/retro/RETRO-2026-05-21-agents-md-operating-layer.md @@ -0,0 +1,36 @@ +# Retro: AGENTS.md operating-layer cycle +Date: 2026-05-21 +Sprint: single-session cycle (spec-validate -> execute -> review-team -> docs -> ship) +Spec: SPEC-024 (PR #7; CHANGELOG [Unreleased], no version bump) + +## Metrics +- Tasks planned: 10, completed: 10, deferred: 0 (2 LOW review findings -> backlog ID-018/019) +- Commits: 16 (build 15 + ship-state 1), atomic, all passed the commit-format hook +- Files changed: 18 (+453 / -57) +- Tests: test-meta 216 -> 241 (+25 asserts), test-hooks 92; both exit 0 throughout +- Verification pipeline: 10/10 task-verifier PASS, 0 retries, 0 escalations; integration-checker PASS (10/10 wired, 6/6 chains) +- Completeness log: clean +- Key commits: 7c5015c (AGENTS.md), b7bb3ee (six-section projection), 6ec7880 (test asserts + install-merge guard), 3ba7022 (review fixes) + +## What worked +- **Adversarial spec-validate paid for itself before any code.** It caught two real defects in the draft: (1) the phantom `install.sh` jq "fix" claimed done with no diff (DEC-004), and (2) an unsatisfiable TASK-001 AC (four AGENTS.md zones described, six demanded, "1:1 mapping" false by the spec's own diagram; DEC-005). Both would have produced a broken or dishonest build. Fixing the spec was cheap; fixing it post-execute would not have been. +- **The honest "test + fix-if-surfaced" reframe of DEC-004 resolved cleanly.** Because the spec stopped claiming a fix and made TASK-009 own "add the test, apply a fix only if it fails", execute proved no bug existed and shipped a regression guard instead of a false changelog line. Honesty in the spec produced an honest changelog. +- **Grep-based acceptance criteria made every task mechanically verifiable.** 10/10 PASS, 0 retries. Doc/config/bash tasks with `grep -q` ACs gave the task-verifier real teeth without a unit-test framework. +- **The doc-impact map did its job at /docs.** It surfaced two companions the build missed (MANUAL `/user:assign` Reads/Writes; CLAUDE downstream-template note); the doc-verifier then confirmed the fixes (9 claims, 0 contradictions). + +## What hurt +- **The execute pipeline AND the integration-checker both rubber-stamped the WORKFLOW read-order restatement.** TASK-002 was "replace, don't duplicate", but its worker left a numbered 1-4 read-list AND a "do not restate" pointer (the exact anti-pattern). The task-verifier passed it (checked the pointer existed + AGENTS.md-first; never checked the old list was gone). The integration-checker's seam-2 "no duplication" check also passed it (saw the "do not restate" sentence and stopped). Only the independent /review-team architecture lens caught it. Root cause: presence checks ("the pointer exists") do not catch absence-invariant failures ("the duplicate is gone"). Negative invariants need explicit negative assertions. +- **Recurring shell/hook friction made every worker rediscover the same three workarounds.** fish `noclobber` aborted `>` redirects to existing temp files; the commit-format/commit-msg path mis-parsed heredoc `-m` messages as a 459-char subject; the safety-gate blocked `rm -f` of temp files. Each worker burned cycles relearning: use `>|`, `git commit -F`/Write-tool message files, and `mv` instead of `rm`. I started pre-warning these in later worker prompts, which helped. +- **The orchestrator edited an acceptance criterion mid-execute.** When DEC-004 resolved as no-bug, I changed TASK-010's CHANGELOG expectation from "fix" to "coverage" during the build. Necessary and later recorded as a spec amendment, but mutating the contract mid-flight (rather than at a checkpoint) is a process smell the architecture review flagged. +- **Retro-file naming is still three-way inconsistent** (this file's `RETRO-YYYY-MM-DD-.md`, a stray `RETRO-2026-05-21.md`, an old `v1.3-v1.5.md`; the retro skill body says `RETRO-[date].md`, WORKFLOW says `v.md`). This is ID-017, still open. + +## Action items +- [ ] Teach the task-verifier + integration-checker to check ABSENCE for replace/remove tasks (assert the replaced/removed content is gone, not just that the new pointer exists). The specific anti-drift test now exists; the general verifier gap does not. -- relates to ID-016 -- owner: Han -- next kit cycle +- [ ] Pre-warn the shell/hook gotchas in the `commands/execute.md` worker template (fish `noclobber` -> `>|`; no heredoc commit `-m` -> `git commit -F`/Write; no `rm` -> `mv`). One block, saves every future worker the rediscovery. -- new backlog item -- owner: Han -- next kit cycle +- [ ] Resolve ID-017 (retro-file naming): pick `RETRO-YYYY-MM-DD-.md`, reconcile the retro skill body + CLAUDE.md + WORKFLOW.md. -- already queued (ID-017) -- owner: Han +- [ ] ID-018 (install tip `cp -n`) and ID-019 (demo SPEC-001 `## After state`) -- already queued from review -- owner: Han + +## Kit feedback +- **Investigate the commit-message heredoc mis-parse.** Workers reported the commit-format/commit-msg hook reading a multi-line heredoc `-m` body as a single 459-char subject and rejecting it. If real, the hook should read only the first line as the subject. Worth a look (possible new backlog item); not yet confirmed as a kit bug vs heredoc misuse. +- The commit-format hook correctly blocked spec-ID/phase-marker subjects (it caught my own "phase 1" subject). Working as intended. +- The verification pipeline's blind spot (presence-not-absence) is the highest-signal kit finding this cycle: a "replace" task that left both copies passed two independent verifiers. diff --git a/docs/retro/RETRO-2026-05-22-mid-flight-spec-amend.md b/docs/retro/RETRO-2026-05-22-mid-flight-spec-amend.md new file mode 100644 index 0000000..a5fc18c --- /dev/null +++ b/docs/retro/RETRO-2026-05-22-mid-flight-spec-amend.md @@ -0,0 +1,34 @@ +# Retro: SPEC-027 mid-flight spec amend (ID-023) +Date: 2026-05-22 +Sprint: single-session dogfood, assign -> spec -> spec-validate -> execute -> review -> docs -> ship + +## Metrics +- Tasks planned: 6, completed: 6, deferred: 0 (the 6 TASK rows; the 8 open `- [ ]` are After-state / global-AC observable checkboxes, not tasks). +- Commits: 10 (9 build + 1 ship bookkeeping), atomic, conventional, no spec IDs in subjects. +- Files changed: 11 (+177 / -14). No new command, no new hook (Approach A: convention). +- Verification: 6/6 task-verifier PASS, 0 retries, 0 escalations; integration-checker PASS; doc-verifier PASS (14/14 claims). meta 254/254 (+4), hooks 92/92. +- Completeness log: clean. Doc-impact: every applicable companion moved (MANUAL + test-meta; no new command, so no README-table / plugin / marketplace impact). + +## Key commits +- `17e411a` vision transition row; `704092b` canonical WORKFLOW rule; `7c6eaae` execute reword; `e2a9b5c` meta-test pin; `1be1b54` the review-driven approval-gate fix (DEC-008). + +## What worked +- **`/spec-validate` caught a real correctness defect before any build.** The draft claimed resume "via `/user:execute` (or `/user:next`)"; verifying against `commands/next.md` vs `commands/execute.md` showed only `/next` skips `[x]` done rows. Fixed as DEC-006 in the spec, so the build inherited the correct rule. The adversarial pass earned its keep on the kit's own spec, not just downstream ones. +- **`/review` caught a HIGH that all the verifiers passed.** task-verifier (x6) and integration-checker both PASSED the build, yet the independent review found the amend path had no operator-approval gate in the canonical rule or the autonomous orchestrator, while the PLAYBOOK card already promised one (a cross-surface inconsistency + an unguarded autonomous self-amend). This is the SPEC-024 retro's ID-020 finding recurring: review catches a class the verifiers structurally cannot (a missing guard is not a failed assertion). +- **Source-of-truth discipline held across 5 independent workers.** DEC-005 (canonical rule in WORKFLOW.md, every other surface points at it) survived 5 separate worker subagents writing 6 surfaces; the integration-checker confirmed zero four-copies drift. The "point, do not restate" instruction in each worker prompt was the load-bearing control. +- **Disjoint-file parallel dispatch in Phase 2 was clean and faster.** TASK-003/004/005 touched execute.md / spec.md / (PLAYBOOK+ORCHESTRATION) with no overlap; dispatched in parallel, no conflicts, each verified independently. + +## What hurt +- **Release-hygiene tangle (recurring).** `VERSION`/`plugin.json` = 1.6.0 but latest tag = v1.5.1 (cut-but-untagged), `[Unreleased]` mixes unmerged PR #7 (SPEC-024) with this cycle, branch 42 commits ahead of `master`. The `/ship` gate had to stop and ask because the state was ambiguous. The SPEC-018 (placement-ui-design) retro flagged this exact failure mode; its recurrence clears the PHILOSOPHY section-5 bar for a guard. +- **The kit prescribes sequential worker dispatch.** `commands/execute.md` says "Execute them one at a time (sequential dispatch; parallel dispatch is a future upgrade)." Phase 2's three tasks were disjoint-file independent, so sequential was pure latency tax; the dogfood deviated to parallel. The instruction does not acknowledge the safe disjoint-file case. +- **The spec underspecified the human-approval gate.** SPEC-027's "Key invariants" listed four and omitted operator-approval; the PLAYBOOK worker reasonably added "you confirm the added scope," which is what created the cross-surface inconsistency review then caught. The autonomy-gate angle (does this let an autonomous loop make a scope/architecture decision unattended?) was not in any `/spec-validate` lens's focus for a docs/convention spec, so validate missed it and review had to. +- **Auto-format left punctuation artifacts.** The slop-cleaner / auto-format converted the spec's TASK-line ` — AC:` em dashes to `., AC:`, producing slightly clumsy "style., AC:" reads. Cosmetic, in the spec doc only, but the em-dash replacement does not always pick graceful punctuation. + +## Action items +- [ ] **Release-hygiene guard (recurring, clears the PHILOSOPHY bar).** A `kit-health` line or a hook that flags `VERSION` cut-but-untagged AND `[Unreleased]` spanning more than one spec. Relates to the ID-016 guard-promotion theme. -> propose BACKLOG. +- [ ] **execute.md: acknowledge disjoint-file parallel dispatch.** Note that independent, non-overlapping-file tasks may be dispatched in parallel; sequential is the safe default only for shared-file or dependent tasks. -> propose BACKLOG (tiny) or a one-line execute.md edit. +- [ ] **spec-validate: an autonomy/approval-gate lens.** For any spec whose behavior runs inside an autonomous loop (`/execute`, `/goal`), a check: "does this let the loop make a scope / architecture / risk decision without a human gate?" Would have caught this cycle's HIGH at validate, not review. -> propose BACKLOG. +- [ ] **LOW: PLAYBOOK scenario-numbering reconcile** (already in TODOS.md): no Scenario 6 card; sections 8-11 use bare numbers on the scenario axis. -> propose BACKLOG (tiny). + +## Kit feedback +The dogfood is positive evidence for the layered verification design: each layer caught a different bug class. `/spec-validate` caught a factual-correctness claim (the resume command); `/review` caught a missing guard + cross-surface inconsistency that the per-task and integration verifiers structurally cannot see; `doc-verifier` confirmed the docs matched the shipped code. The "review catches what verifiers miss" pattern recurred (ID-020 from the SPEC-024 retro), strengthening that backlog item. The one process smell the kit's own machinery did NOT prevent was release hygiene, which is now a two-cycle recurrence and the highest-signal kit finding here. diff --git a/docs/retro/RETRO-2026-05-22-release-hygiene-guard.md b/docs/retro/RETRO-2026-05-22-release-hygiene-guard.md new file mode 100644 index 0000000..04f7efb --- /dev/null +++ b/docs/retro/RETRO-2026-05-22-release-hygiene-guard.md @@ -0,0 +1,33 @@ +# Retro: SPEC-028 release-hygiene guard (ID-026) +Date: 2026-05-22 +Sprint: single-session dogfood, assign -> spec (surface decision) -> spec-validate -> execute -> review -> docs -> ship. Normal lane (the surface choice was warn, not hook). + +## Metrics +- Tasks planned: 3, completed: 3, deferred: 0. +- Commits: 9 (3 surface/test edits, 3 fix commits, spec-progress + docs + ship-bookkeeping). Atomic, conventional. +- Files changed: 7 (+84 / -7). No new command, no new hook, no new file except the spec (Approach A: warn-only convention on two existing surfaces). +- Verification: 3/3 task-verifier PASS (TASK-001 needed 1 fix retry); integration-checker FAIL:fixable then PASS (1 fix); doc-verifier PASS (16/16). meta 256/256 (+2), hooks 92/92. +- Completeness log: clean. Doc-impact: MANUAL (/ship + /kit-health) + CHANGELOG moved; no new command so no plugin/marketplace/README-table impact. + +## Key commits +- `0958107` kit-health check; `0a27c62` ship warn; `2b9f2a1` meta-test pin; `abea75d` + `5ebdbec` + `b5ddf9f` the three DEC-005 alignment fixes (one per verification layer). + +## What worked +- **Surface-decision-first kept the build minimal and set the lane correctly.** Deciding the detection surface in `/spec` (before writing the spec) determined the lane (warn surfaces -> normal, not full) and the research genuinely sharpened the design: a `tests/test-meta.sh` hard assertion was rejected because "VERSION untagged" is a legitimate release transient + CI shallow-clone fragile, and the real signal is the phantom cut (untagged version), not `[Unreleased]` accumulation (which the kit does deliberately). The spec was right-sized as a result. +- **The three verification layers each caught a DIFFERENT drift of the SAME invariant.** DEC-005 said the two inlined copies (ship.md + kit-health.md) must be byte-identical. task-verifier caught the outer-guard drift (`[ -d .git ]` vs `git rev-parse --git-dir`, breaks in worktrees); integration-checker caught the inner-guard drift (the empty-VERSION condition); `/user:review` caught the soft accumulation-signal drift (grep-heading-exists vs awk-non-empty). Three layers, three distinct drifts, on the guard's own two copies. This is the strongest evidence yet for the layered pipeline AND a live demonstration of why identical-by-contract duplication is dangerous. +- **The guard self-fired on its own ship.** At `/user:ship` Step 4a the new warn correctly detected the live phantom cut (`v1.6.0` untagged, `[Unreleased]` accumulating). End-to-end dogfood proof, in situ, warn-only (it reported, did not block). +- **Pinning the check shape in the spec (DEC-005) gave the verifiers something concrete to check.** Without the "identical shape" contract written down, the three drifts would have passed silently. + +## What hurt +- **DEC-003's "inline until 3 uses" produced 2 copies that drifted 3 times in ONE cycle.** The kit's "no premature abstraction (extract at the third occurrence)" rule said keep the check inline at two surfaces. Those two copies then diverged three separate times, each needing a fix round. For logic that MUST be byte-identical across surfaces (identical-by-contract), "wait for the third use" is the wrong heuristic: the cost is paid in drift, not in a premature helper. Partial root cause is mine: my worker prompts were asymmetric (TASK-002 got the exact bash block, TASK-001 got a narrative "check shape"), which seeded the first drift. +- **The 3-deep branch stack deepened the very tangle this spec targets.** Building the release-hygiene guard added a third stacked branch (agents-md -> mid-flight -> release-hygiene) on top of the untagged `1.6.0`, making the release state messier, the exact thing the guard now warns about. The guard detects the mess; it does not clean it up. +- **The phantom cut is still unfixed.** `v1.6.0` remains untagged with three specs' worth of `[Unreleased]` above it across three local branches. The guard now nags about it on every ship, which is correct, but the cleanup (integrate the stack, decide the version, tag) is owed and only grows. + +## Action items +- [ ] **Carve out "identical-by-contract" from the no-premature-abstraction rule.** When two copies of a snippet MUST stay byte-identical (a shared check across surfaces), single-source them (a shared snippet/helper) or pin the exact block, even at two uses; do not wait for the third. Evidence: 3 drifts in one cycle. -> propose BACKLOG (a CLAUDE.md code-quality nuance) + relates to the ID-016/ID-020/ID-026 guard-theme. +- [ ] **Worker-prompt discipline for identical-output tasks.** When two worker tasks must produce identical logic, give BOTH workers the exact same block (not one exact + one narrative). A one-line note in `commands/execute.md`'s worker-dispatch guidance. -> propose BACKLOG (tiny), relates to ID-028. +- [ ] **Resolve the release state the guard is now nagging about.** Integrate the 3-branch stack (agents-md / mid-flight / release-hygiene), decide the version, and tag, so `v1.6.0` (or its successor) stops being a phantom cut. Needs the maintainer's integration decision. -> propose BACKLOG (the real cleanup the guard points at). +- [ ] **ID-024 (context-switch / worktree-per-spec) is reinforced** by the 3-deep stack pain; bump its priority consideration. -> already queued (ID-024), note the reinforcement. + +## Kit feedback +The layered verification design is strongly validated this cycle: three independent layers caught three distinct drifts of one invariant that any single layer would have missed (the per-task verifier could not see the cross-surface inner-guard drift; the integration-checker's scoped structural comparison did not flag the soft accumulation-signal drift; review caught that). The same cycle exposed a tension in the kit's own rules: "no premature abstraction (3x)" is wrong for identical-by-contract logic, where duplication drifts faster than it accumulates uses. The fix is not to drop the rule but to carve out the identical-by-contract case. Secondary: the kit cheerfully let a 3-deep branch stack form on an untagged version; the new guard now warns about the end state, but nothing discourages the stack from forming (ID-024 territory). diff --git a/docs/specs/SPEC-024-agents-md-operating-layer.md b/docs/specs/SPEC-024-agents-md-operating-layer.md new file mode 100644 index 0000000..ec3c12a --- /dev/null +++ b/docs/specs/SPEC-024-agents-md-operating-layer.md @@ -0,0 +1,142 @@ +# Spec: AGENTS.md operating layer + brownfield backfill lane +Generated: 2026-05-21 +Status: SHIPPED + +> Implements ADR-0013 (accepted 2026-05-21) and its After-state addendum. Backlog: ID-015 (full lane). +> Validated 2026-05-21 via /user:spec-validate (5 lenses); revisions recorded in the Decision Log (DEC-004 corrected, DEC-005..DEC-008 added). +> Research note: the standard brownfield 4-agent research pass was skipped because the relevant +> surfaces were already mapped in the session that produced ADR-0013 (WORKFLOW.md, CLAUDE.md, +> commands/assign.md, commands/spec.md, install.sh, the WORKFLOW doc-impact map). No CONTEXT.md was +> regenerated; this spec + ADR-0013 are the implementation context. Regenerate CONTEXT.md at +> /user:execute time only if a worker needs more than the files named per task. + +## Problem +The kit out-enforces `hoangnb24/harness-experimental` (real hooks, verifier, push-blocker vs their advisory markdown) but under-ships the *legible in-repo operating layer a `/goal` can point at*: + +- The entrypoint is `CLAUDE.md` (+ `WORKFLOW.md`), which is Claude-Code-specific. There is no tool-agnostic front door. +- "Pause if / ask a human" is enforced by safety hooks but never *stated* as a directive a goal can mirror. +- There is no brownfield "review the codebase and backfill docs" entry; the cycle is greenfield-feature oriented (`/think -> /spec`). + +Consequence: operators hand-write the rich six-section `/goal` structure every time, with no in-repo referent, so they never reach the altitude shown in the trigger screenshot. The harness-experimental operator does not write that structure into the prompt; they reference structure the repo already carries (its `AGENTS.md` read-order, done-definition, and "ask before" list). We lack that carrier. + +## Solution + +### Approaches considered +- **A: `AGENTS.md` canonical operating layer; `CLAUDE.md` becomes a thin CC-specific pointer (chosen).** Tradeoff: some operate-contract prose moves out of CLAUDE.md; one-time churn. +- **B: Keep `CLAUDE.md` canonical, add `AGENTS.md` as a pointer to it.** Rejected: a Codex/Gemini agent reading `AGENTS.md` gets redirected to a CC-specific file; defeats portability. +- **C: Ship the full harness-experimental scaffold (empty `product/`, `stories/`, `TEST_MATRIX.md`).** Rejected: violates PHILOSOPHY "every file must justify its existence" + "no phantom features". Their product *is* the scaffold; ours is not. + +### Chosen approach + why +Adopt `AGENTS.md` as the tool-agnostic front door carrying the portable operate-contract (ordered read list, task loop, done-definition, explicit "Pause if" list). `CLAUDE.md` shrinks to the CC-only layer (hooks, slash commands, plugin) and points at `AGENTS.md` for the operate-contract. Add a `backfill` brownfield lane to `WORKFLOW.md`. Make the goal-crafter (`commands/assign.md`) emit the six-section operating directive, each section projecting an `AGENTS.md` artifact. Add an observable `## After state` section to the spec template and project it into the goal's `Done-when`. (ADR-0013 decisions 1, 2, 4, 5 + addendum. Decisions 3 and 6 are honored as Out-of-Scope guards below.) + +### Extensibility & boundaries +- **Load-bearing dimension: number of agent runtimes.** `AGENTS.md` is plain markdown any runtime reads. Enforcement stays Claude-Code-only (the hooks). Adding a runtime means it *reads* `AGENTS.md`; it does NOT inherit the guardrails until the v3.x agent-hook work lands. The spec must not over-claim portability of enforcement. +- **Units (each independently testable):** `AGENTS.md` (entrypoint), `CLAUDE.md` (CC layer), `WORKFLOW.md` (lanes), `commands/assign.md` (projection), `commands/spec.md` (after-state template). Each describable in <=3 sentences. + +### Architecture +``` +AGENTS.md (tool-agnostic front door) + | sections: ordered read list, task loop, done-definition, "Pause if" + | + +-- CLAUDE.md (CC-specific: hooks/commands/plugin) --points to--> AGENTS.md (no restating) + +-- WORKFLOW.md (lanes incl. new `backfill`) <-- AGENTS.md points here for lane selection + +-- commands/assign.md (goal-crafter) + | projects --> six-section /goal: + | Context-to-read <- AGENTS.md read list + | Constraints <- AGENTS.md/CLAUDE.md rules + | Operating rules <- AGENTS.md task loop + | Validation loop <- spec ## Verification + | Done-when <- AGENTS.md done-definition + spec ## After state + | Pause-if <- AGENTS.md "Pause if" list + +-- commands/spec.md template (new ## After state, feeds ## Acceptance Criteria) +``` + +## Technical Design + +### Interfaces (I/O contract) +- **`AGENTS.md`** consumes: nothing (it is the root). Produces: **four portable operate-contract zones** (ordered read list, task loop, done-definition, "Pause if" list). The goal-crafter *composes* the six-section `/goal` from these four zones plus the active spec's `## Verification` and `## After state` (and the CLAUDE.md rules), per the Architecture diagram above; the mapping is a composition, not 1:1. Invariant: the four zone names are stable; renaming one without updating `assign.md` breaks the projection. +- **`commands/assign.md`** consumes: the resolved `AGENTS.md` sections + the `_meta/BACKLOG.md` row for `ID-NNN`. Produces: a six-section operating directive in `.claude/goals/.md`. Invariant: BACKLOG-ID-first entry (no freeform-intent path in this spec). +- **`commands/spec.md`** template: gains a `## After state` block between `## Solution` and `## Acceptance Criteria`. Invariant: every After-state bullet is observable (checkable by a human or a command), never narrated prose. + +### Data model changes +- New file `AGENTS.md` at kit root and `examples/hello-spec/AGENTS.md`. +- `commands/spec.md` template gains a `## After state` section. + +### API changes (the goal-crafter contract) +- `commands/assign.md` output shape changes from a tight contract goal (outcome/verify/scope/blocker) to the six-section operating directive, each section pointing at `AGENTS.md`. `Done-when` includes the spec's observable After-state. + +### UI changes +None. The kit ships no UI. + +### Infrastructure changes +- `WORKFLOW.md`: add the `backfill` lane to the lane table + the cycle. Add an `AGENTS.md` companion-doc row to the doc-impact map. The existing bolded self-maintaining rule is scoped to a new top-level *dir*; `AGENTS.md` is a top-level *file*, so TASK-007 also adds a "new top-level file" trigger row so the map self-maintains for files too. +- `install.sh`: add the `AGENTS.md` copy tip alongside the existing `CLAUDE.md` tip. (Separately, this cycle's install dogfood exercised the settings merge/clean against a `settings.json` that already carried third-party hooks, the previously-untested path; TASK-009 owns it as a regression test and applies any jq fix the test surfaces. See DEC-004.) +- `tests/test-meta.sh`: assert `AGENTS.md` carries the four portable zones, and that `assign.md` carries the six-section projection. + +## Task Breakdown +Each task is atomic: implementable in one session, fits in ~50% of a context window. + +### Phase 1: Operating layer +- [x] TASK-001 (DONE 7c5015c, verified): Write `AGENTS.md` at kit root: ordered read list, task loop, done-definition (which MUST include "review recorded + report written, and the final response says what changed and what was not attempted"), and an explicit "Pause if / ask a human" list (architecture direction, source-of-truth hierarchy, validation removal, risk-classification change, privacy/security). State plainly that enforcement is Claude-Code-only; under other runtimes `AGENTS.md` is advisory. - AC: file exists; `grep -q 'Pause if' AGENTS.md`; the four portable zones (read list, task loop, done-definition, Pause-if) are present and named so `assign.md` can project them into the six-section goal. +- [x] TASK-002 (DONE c343d92, verified): Make the CC-layer docs point at `AGENTS.md` for the operate-contract (replace, don't duplicate). Two files: (a) kit-root `CLAUDE.md` (NOT `examples/hello-spec/CLAUDE.md`, which TASK-003 handles) gets/keeps a one-line pointer to `AGENTS.md`; (b) `WORKFLOW.md` is the file that actually carries the operate-contract prose today (`## Required reading` ordered list + `## Completion contract` done-definition), so reconcile it to point at `AGENTS.md` as the read-order/done source rather than restating it. Pick one direction (`AGENTS.md` canonical, `WORKFLOW.md` points) and apply it to both. - AC: neither `CLAUDE.md` nor `WORKFLOW.md` restates the ordered read-order + done-definition that now live in `AGENTS.md`; both reference `AGENTS.md`; `WORKFLOW.md`'s required-reading list names `AGENTS.md` first. +- [x] TASK-003 (DONE 5e4bfc1, verified): Add `examples/hello-spec/AGENTS.md` as the downstream template (realistic placeholder content, same shape). - AC: file exists; the hello-spec README/template note references it. + +### Phase 2: Lanes + projection +- [x] TASK-004 (DONE f9c04cf, verified): Add a `backfill` brownfield lane to `WORKFLOW.md` (lane table + cycle): review an existing codebase and write the operating-layer docs without changing application behavior; spec-optional, doc-output, no app-code edits. - AC: `grep -q backfill WORKFLOW.md`; lane row + a one-line description present. +- [x] TASK-005 (DONE b7bb3ee, verified): Make `commands/assign.md` emit the six-section operating directive, each section projecting an `AGENTS.md` artifact, with `Done-when` including the spec's observable After-state. - AC: `assign.md` documents the six-section projection and the After-state in `Done-when`; the contract-goal-only shape is replaced, not left alongside. +- [x] TASK-006 (DONE 86e3aa0, verified): Add `## After state` to the `commands/spec.md` template (between `## Solution` and `## Acceptance Criteria`), with the observable-not-narrated rule inline. - AC: `grep -q '## After state' commands/spec.md`; the rule text is present. + +### Phase 3: Sync (the doc-impact map) +- [x] TASK-007 (DONE 9fc3e25, verified): Update the `WORKFLOW.md` doc-impact map: add a "new top-level file" trigger row (the current bolded self-maintaining rule covers only a new top-level *dir*, so a top-level *file* like `AGENTS.md` slips through), add the `AGENTS.md` companion-doc row, and a `backfill`-lane reference. - AC: the map names `AGENTS.md` and carries a top-level-file trigger row. +- [x] TASK-008 (DONE f51537e, verified): Update `README.md` "Project structure" and `docs/architecture.md` cross-refs to include `AGENTS.md` and the `backfill` lane. - AC: both name `AGENTS.md`. +- [x] TASK-009 (DONE 6ec7880, verified; merge test passes on current install.sh -> no jq fix needed, DEC-004 resolved as no-bug): Update `tests/test-meta.sh` to assert `AGENTS.md` exists and carries the four portable zones, and that `assign.md` carries the six-section projection. Add a regression test that runs `install.sh` into a throwaway HOME whose `settings.json` already contains a third-party hook (the dogfood case), and asserts the user's hook survives the merge and the result is valid JSON. If that test fails on the current `install.sh`, apply the minimal jq fix to the merge/clean within this task. - AC: `bash tests/test-meta.sh` exercises the four-zone + projection checks AND the merge-with-existing-hooks case, and passes. +- [x] TASK-010 (DONE 43fa85d; version bump REVERTED post-ship per DEC-009): `CHANGELOG.md` entry and the `install.sh` `AGENTS.md` tip. - AC: CHANGELOG records the feature + the install-merge regression test as coverage (not a fix; DEC-004 resolved as no-bug, see TASK-009). The version bump originally done here was reverted to 1.6.0; the kit accumulates under CHANGELOG `[Unreleased]` and does not cut a version per spec (DEC-009). + +## Acceptance Criteria (global) +- [x] All tasks pass their individual acceptance criteria. +- [x] `AGENTS.md` is the front door (root + examples) carrying the four portable zones; `CLAUDE.md` and `WORKFLOW.md` point at it (no restating); `backfill` lane present; `assign.md` projects the six sections; `spec.md` template has `## After state`. +- [x] No regressions: `bash tests/test-meta.sh && bash tests/test-hooks.sh` both exit 0. + +## Verification +`bash tests/test-meta.sh && bash tests/test-hooks.sh` exit 0 AND `grep -q 'Pause if' AGENTS.md` AND `grep -q backfill WORKFLOW.md` AND `grep -q '## After state' commands/spec.md` + +## After state +Observable; each bullet was false before this cycle and is a real check now (all verified at review). +- [x] `AGENTS.md` exists at the kit root and `grep -q 'Pause if' AGENTS.md` passes. (Was: no `AGENTS.md` anywhere in the kit.) +- [x] `CLAUDE.md` points at `AGENTS.md` and no longer restates the read-order / done-definition / pause list. (`WORKFLOW.md` likewise points; the no-restate invariant is now pinned by test-meta.) +- [x] `WORKFLOW.md` lists a `backfill` lane. (Was: only `tiny` / `normal` / `full` / `bug`.) +- [x] `commands/assign.md` produces a six-section directive (Context-to-read / Constraints / Operating rules / Validation loop / Done-when / Pause-if), not a one-line contract goal. +- [x] `commands/spec.md` template contains a `## After state` section, and a newly generated spec carries observable after-state bullets. +- [x] `examples/hello-spec/AGENTS.md` exists so a downstream repo inherits the front door. + +## Edge Cases +1. A non-CC runtime (Codex/Gemini) reads `AGENTS.md` but gets no hook enforcement: `AGENTS.md` must state enforcement is CC-only so no operator assumes the guardrails are portable. +2. A downstream repo already has its own `AGENTS.md`: `install.sh` does not copy `AGENTS.md` at all. Like `CLAUDE.md`, it only prints a copy *tip* (`install.sh:299`), so there is no clobber path to guard. The real requirement is that the printed tip read as copy-if-absent, not a blind overwrite (it must not imply the installer will replace an existing `AGENTS.md`). (Commands are symlinked and rules are copy-with-skip; `AGENTS.md` follows the tip model, not either of those.) +3. An After-state bullet that cannot be made checkable: `/spec-validate` and `/review` must flag it and require cutting or rewriting it (the observable-not-narrated rule is enforced at the gate, not just stated). + +## Failure modes +| Failure class | Detection signal | Mitigation / recovery | +|---|---|---| +| `AGENTS.md` ↔ `CLAUDE.md` / `WORKFLOW.md` drift (a CC-layer doc restates the contract) | doc-verifier or `/review` finds duplicated read-order/done/pause in `CLAUDE.md` or `WORKFLOW.md` | `AGENTS.md` is the single source; both `CLAUDE.md` and `WORKFLOW.md` point only (replace, don't duplicate) | +| Over-claiming portable enforcement | a reviewer reads "works with Codex" as "enforced under Codex" | `AGENTS.md` states advisory-only under non-CC runtimes; PHILOSOPHY honesty rule | +| After-state rots into aspirational fluff | bullets are not checkable on re-read | observable-not-narrated rule + the spec-validate/review gate cut non-checkable bullets | +| `backfill` lane silently edits app code | a backfill run changes behavior, not just docs | lane definition forbids app-behavior change; pause-if covers it | + +## Out of Scope +- **Freeform "griller" entry** (a casual intent with no BACKLOG ID): BACKLOG-ID-first stays canonical (the 2026-05-21 scope call). Why: preserves the detector/mutator + traceability discipline. +- **Portable enforcement / agent-hooks for Codex/Gemini** (ADR-0013 decision 6): deferred to the v3.x multi-runtime work. We add portable *guidance*, not portable *guardrails*. +- **Empty `product/` / `stories/` / `TEST_MATRIX.md` scaffolds** (ADR-0013 decision 3): created only when real content exists. + +## Decision Log +- DEC-001: `AGENTS.md` canonical, `CLAUDE.md` a thin pointer. Rationale: tool-agnostic front door + no duplication. Alternatives B (CLAUDE.md canonical) and C (full empty scaffold) rejected. Who: ADR-0013 (human-accepted). +- DEC-002: `backfill` is a new lane, not a reuse of `normal`. Rationale: brownfield doc-backfill has different inputs (existing code) and a no-app-change constraint. Who: ADR-0013 decision 4. +- DEC-003: After-state is a spec section + a `Done-when` projection target, governed by observable-not-narrated. Rationale: a checkable picture both human and agent verify; fluff is rejected by PHILOSOPHY. Who: ADR-0013 addendum (human-requested). +- DEC-004: The install dogfood for this cycle exercised the settings merge/clean against a `settings.json` that already carried third-party hooks, the previously-untested path. TASK-009 owns this end to end: it adds the regression test and applies any minimal jq fix the test surfaces. (Earlier framing claimed the bugs "were already fixed"; `install.sh` is unchanged on this branch, so the work is owned by TASK-009, not pre-done.) Rationale: a merge that drops or crashes on a real user's existing hooks blocks install. Who: auto (raised in dogfood), corrected in spec-validate. +- DEC-005: `AGENTS.md` carries **four** portable zones (read list, task loop, done-definition, Pause-if); the six-section `/goal` is a *composition* of those four plus the spec's `## Verification` and `## After state`, not a 1:1 mapping. Rationale: the Architecture diagram already shows two goal sections sourced from the spec, so the earlier "six contract zones in `AGENTS.md` / 1:1" framing (Interfaces, TASK-001 AC, TASK-009) was internally contradictory and unsatisfiable. Who: spec-validate Reviewer 5/4. +- DEC-006: The operate-contract prose (ordered read-order + done-definition) lives in `WORKFLOW.md` today, not kit-root `CLAUDE.md`. TASK-002 therefore reconciles `WORKFLOW.md` (point at `AGENTS.md`), not just `CLAUDE.md`, and a failure-mode row now covers `AGENTS.md`↔`WORKFLOW.md` drift. Rationale: without this the dedup is a no-op on the named file and the real three-way overlap ships. Who: spec-validate Reviewer 3/5. +- DEC-007: Edge Case 2 reframed to the tip model: `install.sh` never copies `AGENTS.md` (tip-only, like `CLAUDE.md`), so the "must not clobber" guard described a mechanism that does not exist. Rationale: accuracy; the real requirement is the tip wording. Who: spec-validate Reviewer 2. +- DEC-009: No version bump for this spec. TASK-010 had bumped 1.6.0 -> 1.7.0, but the kit's convention is to accumulate changes under CHANGELOG `[Unreleased]` and cut a version separately (1.7.0 had also clobbered the prior `## [1.6.0]` heading). Reverted all three version surfaces to 1.6.0 and moved the SPEC-024 notes into `[Unreleased]`. Rationale: per-spec version bumps were not the established pattern; the bump also corrupted the changelog history. Who: maintainer call (2026-05-21). +- DEC-008: TASK-007 adds a "new top-level file" trigger row to the doc-impact map. Rationale: the bolded self-maintaining rule is scoped to a new top-level *dir*; a top-level *file* like `AGENTS.md` was not actually covered, so line-71's "already requires" claim was false. Who: spec-validate Reviewer 4. + +## Open questions +(none; a /goal loop appends here if it hits a decision this spec does not cover, then stops) diff --git a/docs/specs/SPEC-026-freeform-front-door.md b/docs/specs/SPEC-026-freeform-front-door.md new file mode 100644 index 0000000..2950764 --- /dev/null +++ b/docs/specs/SPEC-026-freeform-front-door.md @@ -0,0 +1,189 @@ +# Spec: freeform front door (intent -> ID -> lane, no manual bookkeeping) +Generated: 2026-05-22 +Status: VALIDATED + +> Source: the PLAYBOOK.md scenarios (S2 "apply SDD to X", S5 vague brief) and SPEC-024's +> deferred "freeform griller entry". Backlog: ID-022. +> Validated 2026-05-22 via /user:spec-validate (5 lenses); NEEDS REVISION -> revised +> (DEC-003 delegate-to-think, DEC-004 sanitize, DEC-005 atomic-allocate; dedup + concurrency edges added). + +## Problem +The kit's orchestration is BACKLOG-ID-first: `/user:assign` takes an `ID-NNN`. A freeform +intent with no ID ("apply SDD to this feature", a vague brief) has no front door. Today Claude +bridges it by hand every time (`docs/PLAYBOOK.md` Section 8): crystallize -> write a BACKLOG row +-> `/user:assign ID` -> lane. That bridge is correct but is unmanaged: it lives only in Claude's +interpretation, so it is inconsistent run to run, and it is the friction SPEC-024 named when it +deferred the "freeform griller entry" (Out of Scope, that cycle's scope call). + +Consequence: every ad-hoc idea pays a bookkeeping detour before the lane can run, and the most +common way an operator actually starts work (freeform intent in chat) is the one path the kit +does not support natively. + +## Solution + +### Approaches considered +- **A: Extend `/user:assign` to accept freeform intent as well as `ID-NNN` (chosen).** The one + mutator gains a freeform path that crystallizes, auto-allocates the next ID, writes the BACKLOG + row, then proceeds exactly as the ID-first path. Tradeoff: `/assign`'s contract widens; the + crystallize step makes it non-trivially longer than today's lookup. +- **B: A new `/user:griller` (or `/user:intake`) command** dedicated to freeform capture, separate + from `/assign`. Rejected: a second intake mutator splits the detector/mutator model (the kit has + exactly one mutator by design, SPEC-006) and duplicates the routing logic. +- **C: A skill/keyword recognizer** (Claude detects "apply SDD / build X" and runs the bridge with + no command change). Rejected as the primary mechanism: it leaves the front door purely in model + interpretation (the exact fragility this spec exists to remove), and "guardrails over guidance" + prefers an invoked command over a prose convention. (A light recognizer convention can still + point at `/assign `; see Out of Scope.) + +### Chosen approach + why +Widen `/user:assign` so its argument is either an existing `ID-NNN` (today's path, unchanged) or +freeform intent text. On the freeform path it: (1) **delegates** crystallization to `/user:think` +(the existing idea-griller) rather than absorbing the interview, so `/assign` stays the quick +mutator SPEC-006 defined (it does not run a multi-turn interview itself); `/think` returns a +crystallized objective + a lane, (2) `/assign` then does only its mutation tail: sanitize the +intent, allocate the next `ID-NNN`, append a BACKLOG Active-queue row (`Status: queued`), (3) then +runs the identical ID-first path (goal draft, lane pick, activator detect, hand-off). One mutator, +one routing path, ID traceability preserved (the ID is created on the fly, never skipped), and the +interactive part lives in `/think` where it belongs (DEC-003). + +Boundary note: `/assign` orchestrates the freeform path but the **interview is delegated**, not +embedded. `/assign`'s own work stays "allocate + route", consistent with the detector/mutator +split (SPEC-006). + +### Extensibility & boundaries +- **Load-bearing dimension: intake shape.** Today: one shape (ID). After: two shapes (ID, + freeform) behind one entry. A future third shape (e.g. an imported issue) is another branch in + the same resolver, not a new command. +- **Units (independently testable):** the intent resolver (ID vs freeform), the ID allocator + + BACKLOG writer, the crystallize gate, the unchanged ID-first tail. Each describable in <=3 + sentences. + +### Architecture +```text + /user:assign + │ + ▼ + resolve arg ──┬── looks like ID-NNN ──▶ (today's path, unchanged) ──┐ + │ │ + └── freeform text ──▶ crystallize via /user:think │ + └▶ approve, sanitize, allocate, write │ + BACKLOG row (queued) │ + └▶ [human approves objective]──┤ + ▼ + goal draft -> lane pick -> activator -> hand off +``` + +## Technical Design + +### Interfaces (I/O contract) +- **`commands/assign.md`** consumes: `$ARGUMENTS` = either `ID-NNN` OR freeform intent text. + Produces (freeform path): a new `_meta/BACKLOG.md` row with an allocated ID + `Status: queued`, + then the existing `.claude/goals/.md` draft. +- Invariant (delegation): the freeform path delegates the interview to `/user:think`; `/assign` + itself does not run a multi-turn interview (DEC-003), so the mutator stays light. +- Invariant (approve-before-allocate): the freeform path MUST pause for human approval of the + crystallized objective before it allocates an ID, so a vague brief never auto-creates a row. +- Invariant (row-before-draft): the BACKLOG row is written before the goal draft (ID traceability first). +- Invariant (sanitization, DEC-004): freeform intent is sanitized before it touches a file. Table + cells escape `|` (and newlines) so a freeform string cannot break the `_meta/BACKLOG.md` pipe + table; the derived slug is reduced to `[a-z0-9-]+` (no `/`, no `..`) so it cannot traverse out of + `.claude/goals/`. +- Invariant (allocate-atomically, DEC-005): the ID is allocated by re-reading the current max in + the same step that writes the row, and a post-write collision check (two equal IDs) fails loud + rather than silently colliding (mirrors the existing SPEC/ADR dup-number guard). +- Invariant: the ID-first path (an `ID-NNN` argument) is byte-for-byte unchanged. + +### Data model changes +None to the schemas. The BACKLOG Active-queue row shape (SPEC-005 schema) is reused; the freeform +path just writes one (sanitized, per the sanitization invariant). `Source` column records +"freeform intake (date)". The derived slug follows the existing kebab convention but is hardened to +`[a-z0-9-]+`. + +### API changes (the `/user:assign` contract) +`$ARGUMENTS` widens from "an `ID-NNN`" to "an `ID-NNN` or freeform intent". Detection rule: +matches `^ID-[0-9]+$` (after trim) -> ID path; else -> freeform path. + +### UI changes +None. The kit ships no UI. + +### Infrastructure changes +- `commands/assign.md`: the resolver + freeform path (delegating crystallize to `/user:think`) + + the approval gate + sanitization + atomic allocation. +- `tests/test-meta.sh`: assert `assign.md` documents both paths, the `/user:think` delegation, and + the four invariants (row-before-draft, approve-before-allocate, sanitization, atomic-allocate). +- `docs/PLAYBOOK.md`: replace Section 8's "manual bridge" note with the native path once shipped. +- `WORKFLOW.md` `## The spine`: note `/assign` accepts freeform, not only an ID. + +## Task Breakdown +### Phase 1: Resolver + freeform path +- [x] TASK-001 (DONE d920df3, verified): Add the arg resolver to `commands/assign.md` (`ID-NNN` vs freeform) with the ID + path unchanged. - AC: `assign.md` documents the two-shape argument + the `^ID-[0-9]+$` rule. +- [x] TASK-002 (DONE d920df3, verified): Specify the freeform path: **delegate crystallize to `/user:think`** -> approval + gate -> sanitize -> allocate ID + write BACKLOG row -> existing ID-first tail. The interview is + delegated, not embedded (DEC-003). - AC: `assign.md` carries the ordered freeform steps, names the + `/user:think` delegation, and the row-before-draft + approve-before-allocate invariants. + +### Phase 2: Hardening +- [x] TASK-003 (DONE d920df3, verified): Sanitization + concurrency guard in `commands/assign.md`'s freeform path: escape + `|`/newlines in the BACKLOG row cells (table integrity); reduce the slug to `[a-z0-9-]+` (no `/`, + no `..`); allocate the ID by re-reading max in the write step + a post-write equal-ID collision + check that fails loud. - AC: `assign.md` documents the sanitization + atomic-allocate invariants + (DEC-004, DEC-005). + +### Phase 3: Guards + docs +- [x] TASK-004 (DONE 1e59de8, verified): `tests/test-meta.sh` assertions: `assign.md` documents both paths, the delegation, + and all four invariants (row-before-draft, approve-before-allocate, sanitization, atomic-allocate). + - AC: `bash tests/test-meta.sh` exercises them and passes. +- [x] TASK-005 (DONE 893753e, verified): Update `docs/PLAYBOOK.md` (S2/S5/S8 -> native path) and the `WORKFLOW.md` spine + note. - AC: PLAYBOOK no longer calls the bridge "manual today"; the WORKFLOW spine says `/assign` + accepts freeform. + +## After state +Observable; each bullet is false now and a real check once shipped. +- [ ] `/user:assign "apply SDD to X"` (freeform) crystallizes, allocates an ID, writes a BACKLOG + row, and routes into the lane, with one approval gate. (Today: `/assign` only accepts `ID-NNN`.) +- [ ] `/user:assign ID-007` behaves exactly as before (no regression on the ID path). +- [ ] `docs/PLAYBOOK.md` Section 8 describes a native front door, not a manual bridge. +- [ ] The freeform path delegates the interview to `/user:think` (the mutator does not embed it). +- [ ] A freeform intent containing `|` does not corrupt the `_meta/BACKLOG.md` table, and a slug-hostile intent (e.g. `../x`) cannot write outside `.claude/goals/`. +- [ ] `tests/test-meta.sh` pins the two-path contract + all four invariants (row-before-draft, approve-before-allocate, sanitization, atomic-allocate). + +## Acceptance Criteria (global) +- [ ] All tasks pass their individual acceptance criteria. +- [ ] `/assign` accepts freeform OR `ID-NNN`; freeform delegates the interview to `/user:think`, sanitizes input, writes a tracked BACKLOG row before any draft; the ID path is unchanged. +- [ ] No regressions: `bash tests/test-meta.sh && bash tests/test-hooks.sh` both exit 0. + +## Verification +`bash tests/test-meta.sh && bash tests/test-hooks.sh` exit 0 AND `commands/assign.md` documents both the `ID-NNN` and freeform paths, the `/user:think` delegation, and all four invariants (row-before-draft, approve-before-allocate, sanitization, atomic-allocate). + +## Edge Cases +1. Ambiguous arg (looks like an ID but is not in the queue): treat as ID path, report "unknown id" (today's behavior), do NOT silently create a freeform row from a typo'd ID. +2. Freeform intent that is still too vague to name an outcome: the crystallize step loops (think/brainstorming) and does NOT allocate an ID until the objective is approved (no half-baked rows). +3. Duplicate freeform intent (a row already exists for the same idea): detection is by **slug match after crystallize** (the crystallized objective produces a slug; if that slug already has a row/draft, surface it instead of allocating a second ID). Semantic dedup (two differently-worded briefs for the same idea) is **best-effort**: on a near-match slug, ask rather than silently merge or duplicate. (Mirrors the SPEC-005 filesystem-is-truth idempotency, made explicit for freeform.) +4. Concurrent freeform allocation (two sessions allocate at once): both re-read max in the write step; the post-write equal-ID collision check fails loud (the operator re-runs), rather than two rows silently sharing an ID. +5. Pipe / newline in the freeform intent: sanitized (escaped) before the BACKLOG row is written, so the markdown table stays well-formed. + +## Failure modes +| Failure class | Detection signal | Mitigation / recovery | +|---|---|---| +| Freeform path auto-creates rows from vague briefs | BACKLOG fills with half-baked queued rows | the approve-before-allocate invariant: no row until the objective is approved | +| ID path regresses behind the new resolver | an `ID-NNN` assign behaves differently | the ID path is unchanged + a no-regression AC + meta assertion | +| Untracked freeform work | a lane runs with no BACKLOG row | the row-before-draft invariant: the row is written before the goal draft | +| Freeform `|`/newline corrupts the BACKLOG pipe table | the Active-queue table renders broken; a row has wrong columns | the sanitization invariant (DEC-004): escape `|`/newlines in row cells before writing | +| Slug path-traversal from freeform (`../`, `/`) | a draft is written outside `.claude/goals/` | the sanitization invariant (DEC-004): slug reduced to `[a-z0-9-]+` | +| Concurrent allocation picks the same ID | two Active-queue rows share an ID (caught at CI by the dup-number guard, but late) | the atomic-allocate invariant (DEC-005): re-read max in the write step + a loud post-write collision check | + +## Out of Scope +- **A prose keyword-recognizer that auto-runs `/assign` without the operator invoking it** (approach C as the primary mechanism): the front door is an invoked command, not a model convention. A light "you can run `/assign `" pointer is fine; auto-execution on a detected phrase is not. +- **Removing BACKLOG-ID-first**: the ID stays canonical; this spec only changes WHEN the ID is allocated (on the fly for freeform), not WHETHER. +- **Autonomy level of the subsequent lane**: governed by the operator's pre-authorization (PLAYBOOK Section 9), not by this front door. + +## Decision Log +- DEC-001: Extend `/assign` (approach A) rather than add a second intake command (B) or rely on a keyword recognizer (C). Rationale: one mutator (SPEC-006), guardrails over guidance, ID traceability preserved. Who: drafted from the PLAYBOOK scenarios 2026-05-22. +- DEC-002: Freeform path pauses for human approval of the crystallized objective BEFORE allocating an ID. Rationale: a vague brief turned straight into a tracked row + spec is how scope drift starts (PLAYBOOK S5). Who: same. +- DEC-003: The freeform path DELEGATES crystallize to `/user:think`; `/assign` does not embed a multi-turn interview. Rationale: SPEC-006 defines `/assign` as a light mutator-dispatcher that "does not execute"; absorbing an interactive interview would break that boundary. The interview lives in `/think`; `/assign` keeps only allocate + route. Who: spec-validate Reviewer 5 (2026-05-22). +- DEC-004: Freeform intent is sanitized before it touches a file: escape `|`/newlines in BACKLOG table cells; reduce the slug to `[a-z0-9-]+` (no `/`, no `..`). Rationale: a freeform string is operator-controlled text written into a pipe-delimited markdown table and a filesystem path; without escaping it can corrupt the queue table or traverse out of `.claude/goals/`. Who: spec-validate Reviewer 1 (2026-05-22). +- DEC-005: The ID is allocated by re-reading the current max in the same step that writes the row, with a loud post-write equal-ID collision check. Rationale: two concurrent freeform allocations would otherwise pick the same `max+1` and collide (the CI dup-number guard catches it, but late). Who: spec-validate Reviewer 2 (2026-05-22). + +## Open questions +(none; a /goal loop appends here if it hits a decision this spec does not cover, then stops) diff --git a/docs/specs/SPEC-027-mid-flight-spec-amend.md b/docs/specs/SPEC-027-mid-flight-spec-amend.md new file mode 100644 index 0000000..ab94d67 --- /dev/null +++ b/docs/specs/SPEC-027-mid-flight-spec-amend.md @@ -0,0 +1,141 @@ +# Spec: Mid-flight spec amend (BUILDING -> SPECIFYING -> BUILDING) +Generated: 2026-05-22 +Status: SHIPPED + +Source: `docs/operating-layer-vision.md` Scenario 7 + the Section 5 gap-analysis row "Mid-flight spec amend"; ID-023. Goal draft: `.claude/goals/mid-flight-spec-amend.md`. + +## Problem + +You are mid-`/user:execute` on a `VALIDATED` spec (state `BUILDING`). Partway through, the work reveals scope that must be added now: "also do Y." Today there is **no declared path** for this: + +- `commands/execute.md` says outright *"Do NOT modify the spec without asking"* and *"Spec ambiguity discovered: Stop and ask user to clarify."* It tells you to stop, not how to add scope and continue. +- The only ways forward are both bad: silently mutate the spec mid-build (the SPEC-024 retro flagged exactly this, editing a TASK's acceptance criterion mid-execute, as a *"process smell"*, recorded only after the fact), or restart the lane (`/spec` -> `/spec-validate` from scratch), which throws away the completed-task state. +- `docs/operating-layer-vision.md` §3.3 has no `BUILDING -> SPECIFYING` row for an amend; Scenario 7 is listed as a **gap** and is not rendered in `docs/PLAYBOOK.md` or `docs/ORCHESTRATION.md`. + +The state machine is therefore not legible at this point: an operator mid-build cannot answer "where can I go from here, what does it cost, how do I trigger it" for the add-scope case. + +## Solution + +### Approaches considered +- **A (chosen): convention + a recorded checkpoint.** Declare the `BUILDING -> SPECIFYING -> BUILDING` micro-loop across the model + rules + operator docs; reword `execute.md`'s "don't modify the spec" anti-pattern into "amend at a checkpoint via the declared path, never silently"; add an **optional, on-demand `## Amendments`** provenance section to the spec template; pin the convention with a `tests/test-meta.sh` assertion. No new command, no new hook. Tradeoff: discoverability rests on the docs + Claude's interpret layer, not a one-word command; mitigated because the amend is rare (~2 occurrences) and the trigger phrase ("also do Y") is natural. +- **B: a new `/user:amend` command.** One-step, discoverable, repeatable. Rejected for v1: a new command costs `MANUAL.md` + README command table + `.claude-plugin/plugin.json` + `marketplace.json` + `test-meta` frontmatter checks, and PHILOSOPHY rejects unearned commands ("no speculative features"; "earn the abstraction"). A twice-used path does not yet clear that bar. Revisit if amends become frequent. +- **C: a hook-enforced amend ritual.** A `PreToolUse` hook that blocks edits to a `VALIDATED` spec mid-build unless an `## Amendments` entry exists. Rejected: PHILOSOPHY explicitly reserves hard blocks for the safety subset and rejects hard-gating *process* completeness (the completeness clauses warn+log, never block). An amend is a process step, not a safety boundary. + +### Chosen approach + why +Approach A. It closes the gap with the kit's own grain: a **declared, recorded** path instead of a silent mutation, enforced by convention + one structural test rather than a new command or a hard gate. It composes with what exists: amend-the-spec-first means the new scope's source files become "known" to `spec-drift-guard` (which already greps the union of active specs and skips `.md`), and `/user:next` already resumes by picking the next undone (`- [ ]`) task and skipping `- [x]` done rows, so a resumed build runs only the amended tasks. (`/user:execute` re-parses and re-presents the *whole* plan, so resume after an amend leads with `/user:next`, not a fresh `/user:execute`; see DEC-006.) B and C add weight the evidence does not yet justify. + +### The amend micro-loop (the declared transition) + +``` + BUILDING (mid /user:execute, spec is VALIDATED) + │ trigger: operator says "also do Y" / "amend the spec to add Y" + ▼ + [GUARD] at a task checkpoint only: + - the in-flight task is verified + committed (or no task is in flight) + - completed - [x] tasks are FROZEN (never re-opened by an amend) + ▼ + SPECIFYING (amend, not restart) + 1. append new TASK-NNN rows (new phase or appended) to ## Task Breakdown + 2. update ## After state / ## Acceptance Criteria / ## Verification for the DELTA only + 3. record an ## Amendments entry (date | what | why | at which checkpoint | new task ids) + 4. re-validate the DELTA only (full lane: /spec-validate on the new tasks; normal lane: advisory) + ▼ Status STAYS VALIDATED (no drop to DRAFT -> that would be a lane restart) + BUILDING (resume) + - /user:next picks the next undone - [ ] task (skips - [x] done rows) -> runs only the amended tasks + (/user:execute re-lists the FULL plan; resume after an amend leads with /user:next. See DEC-006.) +``` + +Key invariants: +- **No lane restart.** `Status:` never drops back to `DRAFT`. The amend is an in-place delta on a `VALIDATED` spec; only the delta is (re-)validated. This is the literal "without restarting the lane" requirement of ID-023. +- **Completed work is frozen.** An amend may only *add* scope (tasks / AC / after-state bullets). It must not silently rewrite a checked-off task's acceptance criterion; changing an already-DONE contract is a separate, heavier decision (re-open / re-spec), not an amend. +- **Recorded at a checkpoint, not mid-worker.** The `## Amendments` entry is the provenance the SPEC-024 retro asked for, captured *at the moment of the change* and *between tasks*, not reconstructed afterward. + +### `## Amendments` section shape (optional, on-demand) + +Added only when an amend happens (like `## Failure modes` / `## Open questions`, never an empty scaffold): + +```markdown +## Amendments +- AMEND-001: 2026-05-22 | added | why: | at TASK-007 checkpoint | new tasks: TASK-009..TASK-010 | re-validated: delta-only (advisory, normal lane) +``` + +### Source-of-truth placement (avoid duplicating the procedure) +- **`WORKFLOW.md`** (the rules contract) is the **canonical** home of the amend rule (when you may amend, the checkpoint guard, the recorded entry, resume). +- `docs/operating-layer-vision.md` §3.3 gains the `BUILDING -> SPECIFYING` transition **row** (the model) and §5 marks the gap **closed** (-> SPEC-027), mirroring how ID-022 -> SPEC-026 was recorded. +- `docs/PLAYBOOK.md` renders Scenario 7 as an operator card (what you say -> what happens); `docs/ORCHESTRATION.md` renders the loop view. Both **point at** WORKFLOW.md, they do not restate the rule. +- `commands/execute.md` rewords the anti-pattern + the ambiguity branch to point at the declared path. +- `commands/spec.md` documents the optional on-demand `## Amendments` section. + +## Technical Design + +### Interfaces (I/O contract) +- **Consumes:** an active `docs/specs/SPEC-NNN-.md` with `Status: VALIDATED` and a `## Task Breakdown` containing both `- [x]` (done) and `- [ ]` (pending) rows; the existing branch-aware active-spec detection (SPEC-005) that `/user:execute` and `/user:next` already use. +- **Produces:** the same spec file, amended in place: new `- [ ]` TASK rows, an `## Amendments` log entry, delta updates to `## After state` / `## Acceptance Criteria` / `## Verification`. No new files, no schema migration. +- **Invariants:** `Status:` stays `VALIDATED` across an amend; `- [x]` rows are byte-for-byte unchanged by an amend; an amend only appends scope. + +### Data model changes +None. The spec template gains one **optional** documented section (`## Amendments`); no required field, no parser change. `spec-drift-guard.sh` is unchanged (it already skips `.md` and greps the active-spec union). + +### API / UI / Infrastructure changes +None. This is a convention + doc + one meta-test change. No hook code, no new command, no `settings.json`/`hooks.json` wiring. + +## Task Breakdown + +### Phase 1: Declare the transition (model + rules) +- [x] TASK-001 (DONE: 17e411a, verified): Add the `BUILDING -> SPECIFYING` amend row to `docs/operating-layer-vision.md` §3.3 transition table (trigger "also do Y" / "amend"; guard "at a task checkpoint, completed tasks frozen, Status stays VALIDATED"; To `SPECIFYING`), and add the return `SPECIFYING -> BUILDING (resume)` semantics note. Mark §5 gap-analysis row "Mid-flight spec amend" as closed (-> SPEC-027), matching the ID-022 -> SPEC-026 style., AC: `rg "BUILDING -> SPECIFYING" docs/operating-layer-vision.md` matches in §3.3; §5 row references SPEC-027; no other §5 row altered. +- [x] TASK-002 (DONE: 704092b, verified): Add the canonical mid-flight amend rule to `WORKFLOW.md` (the micro-loop, the checkpoint guard, the frozen-completed-tasks + add-only invariants, Status-stays-VALIDATED, resume via `/next` picking the next undone row, the `## Amendments` record)., AC: `WORKFLOW.md` contains a "mid-flight" / "amend" subsection naming all four invariants; `bash tests/test-meta.sh` still green. + +### Phase 2: Wire the operator-facing surfaces (projections) +- [x] TASK-003 (DONE: 7c6eaae, verified): Reword `commands/execute.md` so the anti-pattern "Do NOT modify the spec without asking" and the "Spec ambiguity discovered" / "Task is too large" branches point at the declared amend path (amend at a checkpoint, record in `## Amendments`, resume), instead of only "stop and ask". The no-silent-mutation rule is preserved (amend != silent edit)., AC: `commands/execute.md` references "amend" + "checkpoint"; it no longer instructs a flat "do NOT modify the spec" with no escape hatch; existing tests green. +- [x] TASK-004 (DONE: c11b7af, verified): Document the optional on-demand `## Amendments` section in `commands/spec.md`'s template prose (alongside the other optional sections), with the `AMEND-NNN: date | what | why | at checkpoint | new tasks | re-validated` shape., AC: `commands/spec.md` mentions `## Amendments` and the AMEND entry shape; the section is described as optional/on-demand (no empty scaffold). +- [x] TASK-005 (DONE: 2db948f, verified): Render Scenario 7 in `docs/PLAYBOOK.md` (operator card: "also do Y" mid-build -> the amend micro-loop) and add the loop view to `docs/ORCHESTRATION.md`; both point at WORKFLOW.md as canonical, not restating the rule., AC: `rg -i "also do|mid-flight|amend" docs/PLAYBOOK.md docs/ORCHESTRATION.md` matches in both; both link/point to WORKFLOW.md. + +### Phase 3: Pin the convention +- [x] TASK-006 (DONE: e2a9b5c, verified): Add a `tests/test-meta.sh` section "Mid-flight amend convention" asserting (a) `commands/execute.md` references the amend path (`amend` + `checkpoint`), (b) `WORKFLOW.md` documents the mid-flight amend rule, (c) `commands/spec.md` documents the `## Amendments` section, (d) `docs/operating-layer-vision.md` has the `BUILDING -> SPECIFYING` transition row., AC: the four new assertions PASS; `bash tests/test-meta.sh` exits 0 with an increased total. + +## After state +The definition-of-done picture. Each is false now, true after, and checkable. +- [ ] An operator mid-build can find a declared "add scope without restarting" path. (Today: `commands/execute.md` says only "do NOT modify the spec" / "stop and ask".) Checkable: `rg -i "amend" commands/execute.md WORKFLOW.md` returns the declared path. +- [ ] `docs/operating-layer-vision.md` §3.3 has a `BUILDING -> SPECIFYING` row and §5 marks the mid-flight gap closed. (Today: no row; §5 says "**ID-023** (new)".) Checkable: `rg "BUILDING -> SPECIFYING" docs/operating-layer-vision.md`. +- [ ] The spec template documents an optional `## Amendments` provenance section. (Today: no amend convention exists.) Checkable: `rg "## Amendments" commands/spec.md`. +- [ ] Scenario 7 is rendered for operators in both PLAYBOOK and ORCHESTRATION. (Today: absent from both.) Checkable: `rg -i "also do|mid-flight|amend" docs/PLAYBOOK.md docs/ORCHESTRATION.md` matches in both. +- [ ] The convention is regression-pinned. (Today: nothing tests it.) Checkable: `bash tests/test-meta.sh` includes and PASSes the "Mid-flight amend convention" assertions. + +## Acceptance Criteria (global) +- [ ] All tasks pass their individual acceptance criteria +- [ ] `bash tests/test-meta.sh && bash tests/test-hooks.sh` green (no regressions) +- [ ] The amend rule is canonical in exactly one place (WORKFLOW.md); other docs point at it, not restate it (no source-of-truth duplication) + +## Verification +```bash +bash tests/test-meta.sh && bash tests/test-hooks.sh +``` +Plus the After-state greps above (each is a real command). + +## Edge Cases +1. **Amend requested mid-worker (a task is in flight).** The guard defers: finish + verify + commit the in-flight task to reach a checkpoint, then amend. The amend never interrupts a running worker. +2. **Amend that rewrites an already-DONE task's AC** (not just adds). Out of scope for an amend; that is a re-open/re-spec decision. The convention says amends are add-only; a contract change to completed work pauses for a human (Pause-if: risk-classification / architecture). +3. **Amend on a `DRAFT` (not yet validated) spec.** No amend needed: just edit the DRAFT directly before `/spec-validate`. The amend path is specifically for `VALIDATED`/building specs. +4. **Amend on a single-task spec.** Allowed; the new tasks make it multi-task, so the Step-4 integration-checker now applies on resume. The operator should expect the integration check. +5. **Concurrent specs (multi-spec worktrees).** The amend targets the branch-aware active spec (SPEC-005 detection), so an amend cannot land in the wrong spec. +6. **Amend done without recording the `## Amendments` entry.** Not detected in v1: Approach A is convention, not enforcement (B/C were rejected deliberately). This is a known limitation, mirroring `spec-drift-guard`'s documented grep limitation (the kit's honesty rule: never over-claim enforcement). The completeness clauses already reviewed at `/user:ship` + `/user:retro` are where an un-recorded amend would surface informally; wiring it into the completeness log is a future option, not v1. See DEC-007. + +## Out of Scope +- A `/user:amend` slash command (Approach B), revisit only if amends become frequent. +- Any hook enforcement of the amend ritual (Approach C), PHILOSOPHY rejects hard-gating process. +- Re-opening / re-spec of a `SHIPPED` spec, that is ID-025 (SHIPPED -> TRIAGING), a separate item. +- Context-switch across specs / the ABANDONED terminal, that is ID-024, a separate item. +- Rewriting completed (`- [x]`) tasks' contracts, a heavier decision than an add-only amend. + +## Decision Log +- DEC-001: Convention + recorded checkpoint (Approach A), not a new command (B) or a hook (C). Rationale: matches PHILOSOPHY (earn the abstraction; warn-not-block; no speculative features); the amend has ~2 real uses. Alternatives B/C rejected as unearned weight / a hard gate on process. (Confirmed with the maintainer at the design checkpoint.) +- DEC-002: `Status:` stays `VALIDATED` across an amend; only the delta is (re-)validated. Rationale: dropping to `DRAFT` *is* a lane restart, the exact thing ID-023 removes. Alternative (reset to DRAFT) rejected. +- DEC-003: Amends are **add-only**; completed `- [x]` tasks are frozen. Rationale: rewriting a done contract mid-build is the "process smell" the SPEC-024 retro flagged; keep that on the heavier re-open path. Alternative (allow arbitrary mid-build edits) rejected. +- DEC-004: `## Amendments` is an **optional, on-demand** section, not added to every spec. Rationale: PHILOSOPHY "no empty scaffolds"; mirrors `## Failure modes` / `## Open questions`. Alternative (always-present section) rejected. +- DEC-005: The amend rule is canonical in `WORKFLOW.md`; the vision doc carries the model row, PLAYBOOK/ORCHESTRATION/execute point at it. Rationale: source-of-truth hierarchy; avoid the four-copies drift the kit repeatedly fights. (Build:) the meta-test asserts the rule is present, not duplicated. +- DEC-006 (validation): Resume after an amend leads with `/user:next`, not a fresh `/user:execute`. Rationale: spec-validate verified that `/user:next` skips `- [x]` done rows and picks the next undone task, while `/user:execute` re-parses and re-presents the *whole* plan; only `/user:next` makes the "completed work frozen, only new tasks run" promise hold. Found by Reviewer 2/3; the original draft over-claimed `/user:execute`. +- DEC-007 (validation): An amend done without an `## Amendments` entry is undetected in v1 (documented as a known limitation, Edge case 6), not enforced. Rationale: honesty rule (never over-claim portable enforcement); B/C were rejected, so v1 has no guard. Wiring it into the completeness log is a future option. Found by Reviewer 2. +- DEC-008 (review): The amend requires explicit operator approval of the added scope before it lands, folded into invariant #3 ("operator-approved" in the checkpoint guard) in WORKFLOW.md + commands/execute.md. Rationale: `/user:execute` runs autonomously; adding scope is an AGENTS.md zone-4 Pause-if (scope/architecture) decision, never the loop's. The PLAYBOOK card already promised the gate; the canonical rule + the orchestrator had omitted it (a cross-surface inconsistency + an unguarded self-amend path). Found by `/user:review` (HIGH); folded into #3 to keep "four invariants" intact (no fifth). + +## Open questions +(none; a /goal loop appends here if it hits a decision this spec does not cover, then stops) diff --git a/docs/specs/SPEC-028-release-hygiene-guard.md b/docs/specs/SPEC-028-release-hygiene-guard.md new file mode 100644 index 0000000..989afd2 --- /dev/null +++ b/docs/specs/SPEC-028-release-hygiene-guard.md @@ -0,0 +1,108 @@ +# Spec: Release-hygiene guard (phantom-cut warn) +Generated: 2026-05-22 +Status: SHIPPED + +Source: SPEC-027 retro ("What hurt": the release-hygiene tangle) + SPEC-018 retro (the first occurrence); ID-026. Goal draft: `.claude/goals/release-hygiene-guard.md`. A two-cycle recurrence, which clears the PHILOSOPHY section-5 bar for promoting a check. + +## Problem +The kit can drift into a messy-release state that no surface flags. Concretely (the live state at this spec's writing): `VERSION` and `.claude-plugin/plugin.json` say `1.6.0`, but the latest git tag is `v1.5.1`, so `1.6.0` is a **phantom cut**: the files name a version that was never released. Meanwhile `CHANGELOG.md` `[Unreleased]` keeps accumulating new specs on top of that untagged cut, so it is unclear what `1.6.0` even contains or whether it shipped. This exact state was flagged by the SPEC-018 retro (placement-ui-design: "1.6.0 cut in CHANGELOG + VERSION + plugin.json but never tagged") and recurred in the SPEC-027 retro. Nothing in the kit notices it. + +The kit deliberately does "no version bump" ships that let `[Unreleased]` accumulate (e.g. PR #7), so accumulation alone is **not** the bug. The bug is the **phantom cut**: a version named in the files with no matching git tag, with work piling above it. + +## Solution + +### Approaches considered +- **A (chosen): warn-only surfaces (ship + kit-health), no hook, no hard test.** Add a release-hygiene warn at `/user:ship`'s version step (where the bump/tag decision is made) plus a `kit-health` check line (run before tagging). Both detect the phantom cut (the files' version not present in `git tag`) and warn; neither blocks. A `tests/test-meta.sh` assertion pins that both surfaces carry the check (not that the repo is currently clean). Tradeoff: a warn can be ignored, but PHILOSOPHY reserves hard blocks for the safety subset and the completeness clauses are already warn+log; this matches that grain. +- **B: a `tests/test-meta.sh` hard assertion that `VERSION` is tagged.** Rejected: "VERSION named but not yet tagged" is a **legitimate transient** during a release (bump files, commit, then tag), and CI frequently does not fetch tags (shallow clone), so a hard assertion would go red in the normal flow. The wrong tool for a transient-legitimate state. +- **C: a hook.** Rejected: PHILOSOPHY disfavors hard-gating process; the trigger is fuzzy (on a version-file edit? on a tag?); and the untagged-cut state is a legitimate transient. A hook is overkill for a maintainer-facing heads-up. + +### Chosen approach + why +Approach A. The maintainer at `/user:ship` is exactly who needs the warning at exactly the moment it matters (the version/tag decision); `kit-health` is the documented "run before tagging" self-assessment, so it catches the state even between ships. Both are warn-only, matching the existing completeness-clause grain (warn + report, never block). The meta-test pins the surfaces structurally without asserting a transient-sensitive runtime condition. + +### Detection logic (the shared signal) +- **Phantom cut (primary, mechanical):** read the version from `VERSION` (and `.claude-plugin/plugin.json`; a meta-test already asserts they match), stripping whitespace (`tr -d '[:space:]' < VERSION`, matching the existing meta-test's read so a trailing newline cannot break the tag pattern). If `git tag -l "v"` is empty, the files name an untagged version: warn. (During a real release this fires between the version-bump commit and the tag; the warn is then a correct "remember to tag" nudge.) +- **Identical shape across both surfaces (anti-drift):** ship and kit-health implement this exact check shape (the whitespace-strip, the `git tag -l "v"` test, and the graceful degrade below). The meta-test pins each surface's *presence*; this spec pins their *shape*, so the two inlined copies (DEC-003) cannot diverge in edge behavior. +- **Accumulation context (soft):** when the phantom cut fires AND `CHANGELOG.md` `[Unreleased]` is non-empty, the warn adds "work is accumulating above an untagged cut" so the maintainer sees the compounding, not just the missing tag. +- **Graceful degrade:** if there is no `.git` (e.g. the installed copy under `~/.claude/dwarves-kit`), no `VERSION`, or `git` is unavailable, the check is a no-op (it is repo-scoped; it must not error in a non-repo context). + +### Extensibility & boundaries +- Load-bearing dimension: the number of warn surfaces. Today two (ship, kit-health). The detection is small enough (a `git tag -l` comparison + a CHANGELOG grep) that it is inlined at each surface rather than abstracted (the kit's rule: no shared helper until the same code appears three times). A third surface is the trigger to extract a shared `checks/`-style helper, recorded here so the decision is not re-litigated. +- Boundaries: each surface is independently testable (ship.md carries the warn instruction; kit-health.md carries the check bash); the meta-test pins each. + +### Architecture +```text + VERSION / plugin.json = X + │ + ▼ + git tag -l "vX" ──── empty? ────▶ PHANTOM CUT + │ present │ + CHANGELOG [Unreleased] non-empty? + ▼ ▼ + (clean, silent) WARN (maintainer-facing, never block): + "vX is not tagged; [Unreleased] is piling on top" + ▲ ▲ + │ │ + surface 1: /user:ship Step 4 surface 2: kit-health line (repo-scoped, before tagging) +``` + +## Technical Design + +### Interfaces (I/O contract) +- **Consumes:** `VERSION`, `.claude-plugin/plugin.json` (`.version`), `git tag -l`, `CHANGELOG.md` (`[Unreleased]` section). Read-only. +- **Produces:** a maintainer-facing warning string at `/user:ship` and a `kit-health` report line. No file writes, no block, no exit-code failure from the check itself. +- **Invariants:** the check never blocks a ship and never errors outside a git repo / without a VERSION file (graceful no-op). The meta-test asserts the surfaces carry the check, never that the working tree is currently tag-clean (that would be transient-sensitive). + +### Data model / API / UI changes +None. Two command-prompt edits (`commands/ship.md`, `commands/kit-health.md`) + meta-test assertions. No new file, no new command, no hook, no `settings.json` wiring. + +## Task Breakdown + +### Phase 1: The ship-time warn (primary surface) +- [x] TASK-001 (DONE: 0a27c62 + fix abea75d, verified): Add a release-hygiene warn to `commands/ship.md` at the version step (Step 4, or a Step 4a adjacent to it). It instructs: read the version from `VERSION`, check `git tag -l "v"`; if absent, WARN (phantom cut) and, if `CHANGELOG.md` `[Unreleased]` is non-empty, add the accumulation note. Warn-only, report to the maintainer, never block; degrade silently outside a git repo / without VERSION. Mirror the Step 1b "warn, not block" voice. AC: `commands/ship.md` contains the release-hygiene warn naming the `git tag` phantom-cut check and the "warn, not block" stance; `bash tests/test-meta.sh` green. + +### Phase 2: The kit-health line (secondary surface) +- [x] TASK-002 (DONE: 0958107, verified): Add a release-hygiene check to `commands/kit-health.md` Step 1 (a new numbered bash check). Repo-scoped: compare `VERSION` to `git tag -l "v$(cat VERSION)"`; print a clean/PHANTOM-CUT line; degrade gracefully (no `.git`, no `VERSION`, no `git` => skip with a note, never error). AC: `commands/kit-health.md` Step 1 has the check; it is repo-scoped and degrades gracefully (no hard error in a non-repo context); `bash tests/test-meta.sh` green. + +### Phase 3: Pin the surfaces +- [x] TASK-003 (DONE: 2b9f2a1, verified): Add a `tests/test-meta.sh` section "Release-hygiene guard" with assertions that (a) `commands/ship.md` carries the release-hygiene warn (greps for the phantom-cut / `git tag` check + the warn-not-block stance) and (b) `commands/kit-health.md` carries the phantom-cut check. The assertions pin the LOGIC's presence, NOT that the repo is currently tag-clean. AC: the new assertions PASS; `bash tests/test-meta.sh` exits 0 with an increased total. + +## After state +- [ ] `/user:ship` warns when `VERSION` names an untagged version. (Today: ship bumps/changelogs with no tag-state check; the 1.6.0 phantom cut went unnoticed across two cycles.) Checkable: `rg -i "tag|phantom|release.hygiene" commands/ship.md` returns the warn. +- [ ] `kit-health` reports the phantom-cut state, repo-scoped + graceful. (Today: kit-health checks file count / hooks / descriptions, never the version/tag relationship.) Checkable: `rg -i "tag|phantom|VERSION" commands/kit-health.md` returns the check. +- [ ] The two surfaces are regression-pinned. (Today: nothing tests for them.) Checkable: `bash tests/test-meta.sh` includes and PASSes the "Release-hygiene guard" assertions. +- [ ] The check never blocks and never errors outside a repo. Checkable: the spec's Edge cases 2 + 3 hold by inspection of the added logic. + +## Acceptance Criteria (global) +- [ ] All tasks pass their individual acceptance criteria. +- [ ] `bash tests/test-meta.sh && bash tests/test-hooks.sh` green (no regressions). +- [ ] The guard is warn-only on both surfaces (no block, no test that fails on a legitimate untagged-transient). + +## Verification +```bash +bash tests/test-meta.sh && bash tests/test-hooks.sh +``` +Plus the After-state greps above (each a real command). + +## Edge Cases +1. **Legitimate untagged transient (mid-release).** VERSION bumped + committed, tag not yet created: the warn fires, which is correct (it nudges "tag it"). It must NOT be a hard failure (that is why B was rejected). +2. **Not a git repo / installed copy.** kit-health's bash also runs against `~/.claude/dwarves-kit` (no `.git`): the check degrades to a skip-with-note, never errors. +3. **No `VERSION` file / `git` absent.** Skip silently; the check is best-effort. +4. **Clean state (VERSION tagged, `[Unreleased]` empty or normal):** no warn; the surface stays quiet. +5. **VERSION and plugin.json disagree.** Out of scope here: a separate meta-test already asserts `plugin.json` == `VERSION`; this guard reads `VERSION` as the source. + +## Out of Scope +- Fixing the current `1.6.0` phantom cut or tagging anything (that is a release-cleanup action, not this guard). +- Bumping/auto-tagging versions or changing the ship version logic (this only adds a warn). +- A hard `tests/test-meta.sh` "must be tagged" assertion (Approach B, rejected: transient-legitimate). +- A hook (Approach C, rejected). +- A shared detection helper (deferred until a third surface needs it; see Extensibility). + +## Decision Log +- DEC-001: Warn-only surfaces (ship + kit-health), Approach A; not a hard meta-test (B) or a hook (C). Rationale: matches the completeness-clause warn grain; B breaks on the legitimate untagged-transient + shallow-CI; C hard-gates process (PHILOSOPHY). Alternatives B/C rejected. (Confirmed with the maintainer at the surface checkpoint.) +- DEC-002: The detectable signal is the phantom cut (VERSION version not in `git tag`), not `[Unreleased]` accumulation. Rationale: the kit deliberately accumulates `[Unreleased]` across "no version bump" ships, so accumulation alone is not a bug; the untagged cut is. Accumulation is a soft context line layered on the phantom-cut warn. +- DEC-003: Detection is inlined at each surface, not a shared helper. Rationale: PHILOSOPHY "no premature abstraction" (two uses < three); a third surface triggers extraction. The meta-test pins both inlined copies against drift. +- DEC-004: The meta-test pins the surfaces' presence, never a tag-clean working tree. Rationale: a runtime tag assertion is transient-sensitive (Edge case 1) and CI-fragile (shallow clone). +- DEC-006 (review): The accumulation-context check (the `[Unreleased]` non-empty test) is part of the shared shape, not just the three structural lines. Found by `/user:review` (MEDIUM): kit-health used `grep -q '^## \[Unreleased\]'` (heading-exists, over-fires), while ship.md used the spec's awk non-empty test; the two soft-signal copies had drifted (the integration-checker compared only the three structural lines DEC-005 enumerated, so it did not catch this). Aligned kit-health to ship.md's awk. Lesson folded forward: "identical shape" includes the accumulation check. +- DEC-005 (validation): The check shape is pinned once in "Detection logic" (whitespace-strip the version, `git tag -l "v"`, graceful no-git/no-VERSION degrade) and both surfaces implement it identically. Rationale: spec-validate Reviewer 5 noted the two inlined copies (DEC-003) could drift in edge handling while the meta-test only pins presence; pinning the shape in the spec closes that gap without a premature shared helper. Also folds R1/R2 robustness (quote + whitespace-strip the VERSION read). + +## Open questions +(none; a /goal loop appends here if it hits a decision this spec does not cover, then stops) diff --git a/examples/hello-spec/AGENTS.md b/examples/hello-spec/AGENTS.md new file mode 100644 index 0000000..1ede012 --- /dev/null +++ b/examples/hello-spec/AGENTS.md @@ -0,0 +1,87 @@ +# AGENTS.md: the operating layer (spm) + +> Tool-agnostic front door for the `spm` project. Any agent runtime (Claude Code, +> Codex, Gemini, a human) reads this first to learn how work is done in this repo. +> It carries the portable operate-contract: what to read, how to run a unit of +> work, when work is done, and when to stop and ask. This is the downstream +> template that ships with dwarves-kit; replace the `spm` specifics with your own. + +## Enforcement boundary (read this first) + +**This file is a contract, not a guardrail.** `spm` uses dwarves-kit, and the +kit's enforcement is Claude-Code-only: the hooks (safety-gate, push-to-main +blocker, anti-rationalization Stop hook, the verification pipeline) are what +actually block bad outcomes, and they run only under Claude Code. Under any other +runtime (Codex, Gemini, a bare LLM) this file is **advisory only**: it tells an +agent what to do, but nothing enforces it. Do not assume the guardrails are +portable. See `CLAUDE.md` for the Claude-Code layer (stack, rules, where the spec +lives). + +The rest of this file is four portable zones. A goal-crafter projects them into a +six-section operating directive (see "How a goal is composed" at the end). + +--- + +## 1. Read in this order + +Orient before you touch anything. Read top to bottom; stop when you have enough. + +1. **AGENTS.md** (this file) - how work is done here; the operate-contract. +2. **CLAUDE.md** - the project anchor: what `spm` is, the stack (Python 3.12 / uv / pytest / ruff), code-quality rules, where specs live. +3. **docs/specs/SPEC-NNN-.md** - the active spec; the shared contract for the cycle. Read its `## Verification` and `## After state` before implementing. (Today that is `docs/specs/SPEC-001-version-flag.md`.) +4. **WORKFLOW.md** - reference, not required per task. Read it for the lanes and the gate at each phase boundary. + +## 2. Task loop + +How to do one unit of work. The smallest verifiable increment, verified, committed. + +1. **Size the lane.** Pick `tiny` / `normal` / `full` / `bug` / `backfill` per `WORKFLOW.md`. When in doubt between two lanes, take the heavier one. +2. **Read the spec and its acceptance criteria.** For a spec-driven task: the active spec's task row, its AC, its `## Verification`, and its `## After state`. No spec (tiny lane): the one obvious edit. +3. **Implement the smallest verifiable increment.** One logical change. No speculative features (`spm` does install/freeze/list; new subcommands need a spec), no premature abstraction (no `BaseCommand` until there are 6 commands); clarity over cleverness. +4. **Verify.** Run the spec's `## Verification` command, or the lane's check: `uv run pytest && uv run ruff check .`. Do not claim a result you did not run. +5. **Commit.** Conventional commit, one logical change. No spec/ticket IDs in the subject line. + +If you cannot make progress, see zone 4 (Pause if) and stop with a named blocker note. Do not churn. + +## 3. Done means + +A task or goal is done only when **its acceptance criteria are met AND the +verifier actually ran the check**, not when you claim they pass. Self-reported +"done" is not proof. + +Concretely, done means: **acceptance criteria met, the check actually ran (not +just asserted), review recorded + report written, and the final response says +what changed and what was not attempted.** If you could not run the check, report +that plainly; under Claude Code the anti-rationalization hook is the backstop for +premature completion, but the honesty obligation is yours under any runtime. + +## 4. Pause if (ask a human) + +Stop and ask a human before acting on any of these. These are decisions with +direction or irreversible cost that a goal loop must not make on its own. + +- **Architecture direction** - a new subcommand, a new module, a change to the argparse dispatch shape, or any new public interface for the CLI. +- **Source-of-truth hierarchy** - which file or section is canonical when two disagree (for example, the operate-contract split across `AGENTS.md`, `CLAUDE.md`, and `WORKFLOW.md`). +- **Validation removal** - weakening, deleting, or skipping a test, a ruff rule, or any check; suppressing a `pip` subprocess error instead of propagating its exit code. +- **Risk-classification change** - moving work to a lighter lane, or narrowing a `full`-lane trigger (anything touching how `pip` is invoked, the wheel build, the published package, or `pyproject.toml` dependencies). +- **Privacy / security** - secrets, credentials, access scope, anything that touches what data leaves the repo or who can reach it. + +When you pause: write the named blocker, state the decision you are not making and why, and stop. + +--- + +## How a goal is composed + +A goal-crafter projects these four zones into a six-section operating directive. +The mapping is a **composition, not 1:1**: two of the six sections come from the +active spec, not from this file. Keep the four zone names stable; renaming one +breaks the projection. + +| Goal section | Source | +|---|---| +| Context-to-read | AGENTS.md zone 1 (Read in this order) | +| Constraints | CLAUDE.md / AGENTS.md rules | +| Operating rules | AGENTS.md zone 2 (Task loop) | +| Validation loop | the active spec's `## Verification` | +| Done-when | AGENTS.md zone 3 (Done means) + the active spec's `## After state` | +| Pause-if | AGENTS.md zone 4 (Pause if) | diff --git a/examples/hello-spec/README.md b/examples/hello-spec/README.md index 10b063a..143cf05 100644 --- a/examples/hello-spec/README.md +++ b/examples/hello-spec/README.md @@ -4,11 +4,14 @@ A small, self-contained example showing what dwarves-kit produces when you run i ## What this shows -A hypothetical Python CLI called `spm` ("simple package manager") needs a `--version` flag. This directory contains the three files dwarves-kit would generate or rely on to ship that change end-to-end. Each is a normal output, not a mock. +A hypothetical Python CLI called `spm` ("simple package manager") needs a `--version` flag. This directory contains the files dwarves-kit would generate or rely on to ship that change end-to-end. Each is a normal output, not a mock. -The point: see what `docs/specs/SPEC-001-version-flag.md` actually looks like; see how `CLAUDE.md` anchors a contractor's session; see how the kit's commands chain. +The point: see what `docs/specs/SPEC-001-version-flag.md` actually looks like; see how `AGENTS.md` is the tool-agnostic front door a downstream repo inherits; see how `CLAUDE.md` anchors a contractor's session; see how the kit's commands chain. -## The three files +## The files + +### `AGENTS.md` +The downstream front door. Tool-agnostic operating layer (read-order, task loop, done-definition, "Pause if" list) that any agent runtime reads first. This is the realistic-placeholder template a project inherits from dwarves-kit; `CLAUDE.md` sits under it as the Claude-Code-specific layer. Enforcement (hooks, verifier) is Claude-Code-only; under other runtimes `AGENTS.md` is advisory. ### `CLAUDE.md` Project anchor. Read by Claude Code at session start, by `/user:spec` for stack detection, by every worker subagent in `/user:execute`. Tells Claude what the project is, what the stack is, where the spec lives, and what code-quality rules apply. @@ -49,9 +52,10 @@ PASS -> next task; FAIL:fixable -> fix-agent (max 2 retries); FAIL:escalate -> h ## Reading order -1. Read `CLAUDE.md` first; it's the project anchor a contractor sees first. -2. Then `docs/specs/SPEC-001-version-flag.md`: the actual feature plan. -3. Notice: no `--version` code yet. The spec is the input to `/user:execute`. The example stops at "spec ready to build", not "feature implemented", to keep the example readable. +1. Read `AGENTS.md` first; it's the tool-agnostic front door (read-order, task loop, done-definition, Pause-if). +2. Then `CLAUDE.md`: the Claude-Code project anchor that sits under the front door. +3. Then `docs/specs/SPEC-001-version-flag.md`: the actual feature plan. +4. Notice: no `--version` code yet. The spec is the input to `/user:execute`. The example stops at "spec ready to build", not "feature implemented", to keep the example readable. --- diff --git a/install.sh b/install.sh index 9f3f146..134d396 100755 --- a/install.sh +++ b/install.sh @@ -299,4 +299,7 @@ echo "" echo "Tip: Copy CLAUDE.md template to your project root:" echo " cp $KIT_DIR/CLAUDE.md ./CLAUDE.md" echo "" +echo "Tip: If your project does not already have an AGENTS.md, copy one to its root:" +echo " cp $KIT_DIR/AGENTS.md ./AGENTS.md" +echo "" echo "To uninstall: bash $KIT_DIR/install.sh --uninstall" diff --git a/tests/test-meta.sh b/tests/test-meta.sh index f0d62fd..67045be 100755 --- a/tests/test-meta.sh +++ b/tests/test-meta.sh @@ -149,6 +149,238 @@ INPLACE_BROKEN=$(for f in "$INPLACE_HOME/.claude/dwarves-kit/hooks/"*.sh; do [ - assert_eq "in-place install keeps hook scripts resolvable (broken: ${INPLACE_BROKEN:-none})" "" "$INPLACE_BROKEN" rm -rf "$INPLACE_HOME" +# ============================================================ +echo "" +echo "=== AGENTS.md operating layer (SPEC-024) ===" +# ============================================================ +# Part A: pin the cycle's structural outputs so a wording flip fails CI. +# 1. kit-root AGENTS.md exists + carries the four portable zones + the literal +# "Pause if" + the CC-only-enforcement statement. +# 2. commands/assign.md carries the six-section /goal projection (the writer +# side of the AGENTS.md->assign.md projection). +# 3. Low-cost regression guards for TASK-003 (hello-spec AGENTS.md) and +# TASK-006 (spec.md "## After state"). + +AGENTS_MD="$KIT_DIR/AGENTS.md" +TOTAL=$((TOTAL + 1)) +if [ -f "$AGENTS_MD" ]; then + echo -e " ${GREEN}PASS${NC} AGENTS.md exists at kit root (SPEC-024)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} AGENTS.md missing at kit root" + FAIL=$((FAIL + 1)) +fi + +# The four portable zones (DEC-005). Pin the heading literals, not prose. +for ZONE in "## 1. Read in this order" "## 2. Task loop" "## 3. Done means" "## 4. Pause if (ask a human)"; do + TOTAL=$((TOTAL + 1)) + if grep -qF "$ZONE" "$AGENTS_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} AGENTS.md has zone '$ZONE'" + PASS=$((PASS + 1)) + else + echo -e " ${RED}FAIL${NC} AGENTS.md missing zone '$ZONE'" + FAIL=$((FAIL + 1)) + fi +done + +# The literal "Pause if" (the fourth zone's stable phrase, also the goal section). +TOTAL=$((TOTAL + 1)) +if grep -qF 'Pause if' "$AGENTS_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} AGENTS.md carries the literal 'Pause if'" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} AGENTS.md lost the literal 'Pause if'" + FAIL=$((FAIL + 1)) +fi + +# The CC-only-enforcement statement (PHILOSOPHY honesty rule: never over-claim +# portable enforcement). A drift to "enforcement is portable" would be a lie. +TOTAL=$((TOTAL + 1)) +if grep -qF 'Enforcement is Claude-Code-only' "$AGENTS_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} AGENTS.md states enforcement is Claude-Code-only" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} AGENTS.md lost the CC-only-enforcement statement" + FAIL=$((FAIL + 1)) +fi + +# commands/assign.md carries the six-section projection (the writer side). A +# wording flip on any section name breaks the AGENTS.md->assign.md projection. +ASSIGN_MD="$KIT_DIR/commands/assign.md" +for SECTION in "Context-to-read" "Constraints" "Operating rules" "Validation loop" "Done-when" "Pause-if"; do + TOTAL=$((TOTAL + 1)) + if grep -qF "$SECTION" "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md has projection section '$SECTION'" + PASS=$((PASS + 1)) + else + echo -e " ${RED}FAIL${NC} assign.md missing projection section '$SECTION'" + FAIL=$((FAIL + 1)) + fi +done + +# Low-cost regression guards: TASK-003 (hello-spec AGENTS.md w/ "Pause if") and +# TASK-006 (spec.md template's "## After state"). Pin both so they cannot silently +# regress. +DEMO_AGENTS="$KIT_DIR/examples/hello-spec/AGENTS.md" +TOTAL=$((TOTAL + 1)) +if [ -f "$DEMO_AGENTS" ] && grep -qF 'Pause if' "$DEMO_AGENTS" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} examples/hello-spec/AGENTS.md exists + carries 'Pause if' (TASK-003)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} examples/hello-spec/AGENTS.md missing or lost 'Pause if'" + FAIL=$((FAIL + 1)) +fi + +# Review issue 2: the downstream template (the file real projects copy) must pin +# all four zone headings too, not just "Pause if" - same teeth as the kit root. +for ZONE in "## 1. Read in this order" "## 2. Task loop" "## 3. Done means" "## 4. Pause if (ask a human)"; do + TOTAL=$((TOTAL + 1)) + if grep -qF "$ZONE" "$DEMO_AGENTS" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} examples/hello-spec/AGENTS.md has zone '$ZONE'" + PASS=$((PASS + 1)) + else + echo -e " ${RED}FAIL${NC} examples/hello-spec/AGENTS.md missing zone '$ZONE'" + FAIL=$((FAIL + 1)) + fi +done + +TOTAL=$((TOTAL + 1)) +if grep -qF '## After state' "$KIT_DIR/commands/spec.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} commands/spec.md template carries '## After state' (TASK-006)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} commands/spec.md template lost '## After state'" + FAIL=$((FAIL + 1)) +fi + +# Review issue 6: assign.md Done-when must reference the spec's "## After state" +# (the projection source), not merely carry the "Done-when" label. +TOTAL=$((TOTAL + 1)) +if grep -qF '## After state' "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md Done-when references the spec's '## After state'" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} assign.md Done-when lost the '## After state' projection source" + FAIL=$((FAIL + 1)) +fi + +# Review issue 1 (anti-drift): the spec's primary failure mode is a CC-layer doc +# RESTATING the ordered read-list that AGENTS.md owns (zone 1 is the single source). +# WORKFLOW.md and CLAUDE.md must point, not carry a numbered "1. AGENTS.md / +# 2. CLAUDE.md ..." restatement. A reappearance is drift; fail loudly. Scoped to +# these two CC-layer docs; AGENTS.md itself legitimately carries the list. +for DOC in WORKFLOW.md CLAUDE.md; do + RESTATE=$(grep -cE '^[0-9]+\.[[:space:]]+(AGENTS|CLAUDE)\.md' "$KIT_DIR/$DOC" 2>/dev/null || true) + assert_eq "$DOC does not restate the AGENTS.md read-order list (no drift)" "0" "$RESTATE" +done + +# ------------------------------------------------------------ +# Part B: install.sh merge-with-existing-hooks regression (DEC-004). +# The existing installer test runs into a HOME with NO settings.json, so it never +# exercises the jq clean+merge path. This test pre-seeds settings.json with a +# THIRD-PARTY hook (a command that does NOT contain "dwarves-kit") and asserts the +# merge preserves it, yields valid JSON, and still pulls in a dwarves-kit hook. +MERGE_HOME=$(mktemp -d) +mkdir -p "$MERGE_HOME/.claude" +THIRD_PARTY_CMD="/opt/acme/hooks/audit-log.sh" +# Build the pre-existing settings via jq so it is always well-formed JSON. +jq -n --arg cmd "$THIRD_PARTY_CMD" '{ + hooks: { + PreToolUse: [ + { matcher: "Bash", hooks: [ { type: "command", command: $cmd } ] } + ] + } +}' > "$MERGE_HOME/.claude/settings.json" + +HOME="$MERGE_HOME" bash "$KIT_DIR/install.sh" >/dev/null 2>&1 +MERGED_SETTINGS="$MERGE_HOME/.claude/settings.json" + +# (a) the third-party hook command survived the merge. +TOTAL=$((TOTAL + 1)) +if grep -qF "$THIRD_PARTY_CMD" "$MERGED_SETTINGS" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} install merge preserves the third-party hook (DEC-004)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} install merge DROPPED the third-party hook (merge bug)" + FAIL=$((FAIL + 1)) +fi + +# (b) the resulting settings.json is valid JSON. +TOTAL=$((TOTAL + 1)) +if jq '.' "$MERGED_SETTINGS" >/dev/null 2>&1; then + echo -e " ${GREEN}PASS${NC} merged settings.json is valid JSON" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} merged settings.json is not valid JSON (merge corrupted it)" + FAIL=$((FAIL + 1)) +fi + +# (c) at least one dwarves-kit hook was merged in alongside the third-party one. +TOTAL=$((TOTAL + 1)) +KIT_HOOK_COUNT=$(jq '[.hooks | to_entries[] | .value[] | .hooks[] | select(.command | tostring | contains("dwarves-kit"))] | length' "$MERGED_SETTINGS" 2>/dev/null || echo 0) +if [ "${KIT_HOOK_COUNT:-0}" -gt 0 ]; then + echo -e " ${GREEN}PASS${NC} install merge added at least one dwarves-kit hook ($KIT_HOOK_COUNT)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} install merge added no dwarves-kit hooks" + FAIL=$((FAIL + 1)) +fi +rm -rf "$MERGE_HOME" + +# ============================================================ +echo "" +echo "=== Freeform front door (SPEC-026) ===" +# ============================================================ +# Pin the SPEC-026 contract in commands/assign.md so a wording flip on any of +# the intake paths, the /user:think delegation, or the four invariants fails CI. +# All literals exist in assign.md today; this guards them from silent drift. +# ASSIGN_MD is set in the SPEC-024 block above. + +# Two-shape resolver: the ID-first regex AND the freeform branch must both be named. +for LITERAL in '^ID-[0-9]+$' 'freeform'; do + TOTAL=$((TOTAL + 1)) + if grep -qF "$LITERAL" "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md documents the '$LITERAL' intake shape (SPEC-026)" + PASS=$((PASS + 1)) + else + echo -e " ${RED}FAIL${NC} assign.md lost the '$LITERAL' intake shape (resolver drift)" + FAIL=$((FAIL + 1)) + fi +done + +# Delegation: the crystallize interview is delegated to /user:think, not embedded (DEC-003). +TOTAL=$((TOTAL + 1)) +if grep -qF '/user:think' "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md delegates crystallize to /user:think (SPEC-026 DEC-003)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} assign.md lost the /user:think delegation (interview embedded?)" + FAIL=$((FAIL + 1)) +fi + +# The four invariants. atomic-allocate is pinned via BOTH its named marker and the +# 'collision' guard wording, since both literals are load-bearing in assign.md. +for INVARIANT in 'row-before-draft' 'approve-before-allocate' 'sanitize' 'atomic-allocate' 'collision'; do + TOTAL=$((TOTAL + 1)) + if grep -qF "$INVARIANT" "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md pins the '$INVARIANT' invariant (SPEC-026)" + PASS=$((PASS + 1)) + else + echo -e " ${RED}FAIL${NC} assign.md lost the '$INVARIANT' invariant (contract drift)" + FAIL=$((FAIL + 1)) + fi +done + +# Slug hardening: the sanitized slug charset must stay pinned (path-traversal guard). +TOTAL=$((TOTAL + 1)) +if grep -qF '[a-z0-9-]' "$ASSIGN_MD" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} assign.md pins the '[a-z0-9-]' slug charset (SPEC-026 DEC-004)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} assign.md lost the '[a-z0-9-]' slug charset (slug hardening drift)" + FAIL=$((FAIL + 1)) +fi + # ============================================================ echo "" echo "=== Agent files ===" @@ -817,6 +1049,91 @@ else FAIL=$((FAIL + 1)) fi +# ============================================================ +echo "" +echo "=== Mid-flight amend convention (SPEC-027) ===" +# ============================================================ +# Pin the BUILDING -> SPECIFYING -> BUILDING amend convention across its four +# surfaces so a wording flip on any of them fails CI. WORKFLOW.md is the canonical +# home of the rule; the other three are projections/the model row that point at it. + +# (a) execute.md reroutes the "don't modify the spec" anti-pattern to the declared +# amend path: it must reference BOTH "amend" and "checkpoint". +TOTAL=$((TOTAL + 1)) +if grep -qF 'amend' "$KIT_DIR/commands/execute.md" 2>/dev/null \ + && grep -qF 'checkpoint' "$KIT_DIR/commands/execute.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} execute.md references the amend path (amend + checkpoint) (SPEC-027)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} execute.md lost the amend path (needs amend + checkpoint)" + FAIL=$((FAIL + 1)) +fi + +# (b) WORKFLOW.md is the canonical home: it must carry the "Mid-flight amend" rule. +TOTAL=$((TOTAL + 1)) +if grep -qF 'Mid-flight amend' "$KIT_DIR/WORKFLOW.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} WORKFLOW.md documents the Mid-flight amend rule (SPEC-027, canonical)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} WORKFLOW.md lost the Mid-flight amend rule" + FAIL=$((FAIL + 1)) +fi + +# (c) spec.md documents the optional on-demand "## Amendments" provenance section. +TOTAL=$((TOTAL + 1)) +if grep -qF '## Amendments' "$KIT_DIR/commands/spec.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} spec.md documents the '## Amendments' section (SPEC-027)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} spec.md lost the '## Amendments' section" + FAIL=$((FAIL + 1)) +fi + +# (d) operating-layer-vision.md carries the BUILDING -> SPECIFYING amend transition +# row in §3.3. Pin the whole row (From cell BUILDING, the amend trigger, To cell +# SPECIFYING) so the model stays legible; brittle-proofed via the full-row regex. +TOTAL=$((TOTAL + 1)) +if grep -qE '\| BUILDING \|.*amend the spec.*\| SPECIFYING' "$KIT_DIR/docs/operating-layer-vision.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} operating-layer-vision.md has the BUILDING -> SPECIFYING amend row (SPEC-027)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} operating-layer-vision.md lost the BUILDING -> SPECIFYING amend transition row" + FAIL=$((FAIL + 1)) +fi + +# ============================================================ +echo "" +echo "=== Release-hygiene guard (SPEC-028) ===" +# ============================================================ +# Pin the PRESENCE of the phantom-cut warn on its two surfaces so a deletion or a +# wording flip fails CI. DEC-004: assert the surfaces carry the check, NEVER that +# the working tree is currently tag-clean ("VERSION named but untagged" is a +# legitimate transient during a release and CI often does not fetch tags). So we +# grep the command-prompt files; we never run the phantom-cut check against the repo. + +# (a) ship.md (Step 4a) carries the phantom-cut / git-tag check AND the warn-not-block stance. +TOTAL=$((TOTAL + 1)) +if grep -qF 'git tag -l' "$KIT_DIR/commands/ship.md" 2>/dev/null \ + && grep -qiF 'phantom' "$KIT_DIR/commands/ship.md" 2>/dev/null \ + && grep -qiF 'warn, not block' "$KIT_DIR/commands/ship.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} ship.md carries the release-hygiene warn (phantom-cut git-tag check + warn-not-block) (SPEC-028)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} ship.md lost the release-hygiene warn (needs git-tag phantom-cut check + warn-not-block stance)" + FAIL=$((FAIL + 1)) +fi + +# (b) kit-health.md carries the phantom-cut check. +TOTAL=$((TOTAL + 1)) +if grep -qF 'git tag -l' "$KIT_DIR/commands/kit-health.md" 2>/dev/null \ + && grep -qiF 'phantom' "$KIT_DIR/commands/kit-health.md" 2>/dev/null; then + echo -e " ${GREEN}PASS${NC} kit-health.md carries the phantom-cut check (git-tag check + phantom) (SPEC-028)" + PASS=$((PASS + 1)) +else + echo -e " ${RED}FAIL${NC} kit-health.md lost the phantom-cut check (needs git-tag check + phantom)" + FAIL=$((FAIL + 1)) +fi + # ============================================================ echo "" echo "=== Results ==="