Skip to content

tech-debt(dogfood): cross-shard capability isolation breaks requires/establishes sequencing #116

Description

@Lupus

What

The progressive dogfood harness (hack/dogfood/run_journeys.py) sequences journeys across tiers via establishes/requires capability tokens — a journey that produces a capability declares establishes: <token>, and a downstream journey that consumes it declares requires: <token>. However, the runner launches each shard with its own independent data directory and sandbox set, so a capability or artifact established in one shard (for example, a locally-built image tag) is invisible to any dependent journey assigned to a different shard. Cross-shard requires silently have no precondition fulfilled.

Why

When a requires dependency is violated by shard assignment, the dependent journey fails for a harness reason, not a product reason. This produces a false negative that the skeptic agent must triage — wasting review time and diluting trust in the dogfood signal. The concrete case observed was run-built-local-tag (requires: local-tag-created) failing in shard 2 because the build that created the tag ran in shard 1. If this class of failure is not eliminated, the progressive dogfood loop cannot be trusted as a product oracle.

In Scope

  • Fix run_journeys.py shard assignment so that journeys sharing a capability chain (establishesrequires) are guaranteed to land in the same shard or session, OR
  • Alternatively, allow a journey to self-provision its requires precondition (idempotent setup step) so it does not depend on another shard's side effects, OR
  • Alternatively, expose a shared-state mechanism (e.g. a shared data directory or capability registry) so a capability established in shard N is visible to any shard that needs it
  • Exactly one of these approaches (or an equivalent) must be implemented and tested
  • Documentation for journey authors describing the guarantee and any authoring constraints that come with it (e.g. "journeys in the same capability chain must not declare conflicting sandbox names")

Out of Scope

  • Changing the capability token vocabulary or the semantic meaning of establishes/requires
  • Modifying the oracle, skeptic, or journey-compiler agents
  • Enforcing intra-journey sequencing (already working; this issue is cross-shard only)
  • Addressing any other dogfood harness gaps beyond the shard-isolation breakage

Acceptance Criteria

  • A journey that declares requires: <token> runs against a state where <token> is actually present, regardless of how many shards the runner spawns
  • The establishes/requires contract is honored deterministically — shard count and assignment order cannot cause a capability to be missing for a dependent journey
  • A regression test or harness self-check catches a future cross-shard dependency violation (e.g. a dry-run mode that validates the capability graph before launching shards)
  • Journey-author documentation (inline comment block or hack/dogfood/README.md section) states the guarantee and any co-location or naming constraints journey authors must observe

INVEST Notes

The chosen fix strategy (co-location vs. self-provision vs. shared state) is left to the implementer; all three approaches satisfy the acceptance criteria. If co-location is chosen, note that it constrains maximum parallelism — journey authors should be warned that long capability chains collapse onto a single shard. If self-provision is chosen, journey templates should be updated so authors know where to add the idempotent setup step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    effort:MMedium: multiple files or components; some design thoughtpriority:P2Medium: planned work; not blocking anything criticaltype:tech-debtInternal restructuring; no user-visible behaviour change

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions