Skip to content

spin-node: honour multi-aggregator presets, enforce >=1 per subnet#172

Open
ch4r10t33r wants to merge 1 commit intomainfrom
feat/spin-node-multi-aggregator
Open

spin-node: honour multi-aggregator presets, enforce >=1 per subnet#172
ch4r10t33r wants to merge 1 commit intomainfrom
feat/spin-node-multi-aggregator

Conversation

@ch4r10t33r
Copy link
Copy Markdown
Contributor

Summary

`spin-node.sh` enforced "exactly one aggregator per subnet": it kept only the first preset per subnet, reset every `isAggregator` flag to false on every deploy, and aborted with `expected exactly 1` if it ever saw more than one `true` row in a subnet. Operators who hand-edited `validator-config.yaml` for redundancy (e.g. `zeam_0` + `grandine_0` on subnet 0, `qlean_1` + `ethlambda_0` on subnet 1) silently lost their second aggregator on the next full deploy.

Behaviour change

Scenario Before After
Subnet has 0 presets 1 random pick 1 random pick (unchanged)
Subnet has 1 preset preset honoured preset honoured (unchanged)
Subnet has N>=2 presets only first kept; rest reset to `false` all N honoured
`--aggregator NODE` given NODE replaces preset on its subnet NODE replaces presets on its subnet (unchanged); other subnets honour their presets
Verification invariant `agg_count == 1` per subnet `agg_count >= 1` per subnet

The YAML mutation is now a single pass: each row is set to `true` iff its name is in the final aggregator set, else `false`. This replaces the previous reset-then-set sequence which left the file in a transiently empty state.

Why this is safe downstream

Every client cmd (`client-cmds/zeam-cmd.sh`, `ethlambda-cmd.sh`, `grandine-cmd.sh`, etc.) and every ansible role (`ansible/roles/*/tasks/main.yml`) reads `isAggregator` per node independently. Multiple `true` rows in the YAML naturally translate into multiple containers being launched with `--is-aggregator`. There is no global "single aggregator" assumption anywhere in the deploy path.

What stays the same

  • `--restart-client` still skips the aggregator block entirely (no change to existing flags).
  • The cross-subnet client-family uniqueness heuristic still applies to random fallbacks. Presets are exempt because the operator opted in.
  • `--dry-run` still skips both YAML writes and the verify pass.

Test plan

Verified in a temp `NETWORK_DIR` with a 3-subnet, 8-node fixture:

  • Subnet with two presets (`zeam_0` + `grandine_3`) → both kept.
  • Subnet with one preset (`ethlambda_4`) → preset kept.
  • Subnet with zero presets → one random pick, distinct family.
  • `--aggregator ream_1` (subnet 1 = ream_1's subnet) → ream_1 replaces ethlambda_4 preset on subnet 1; subnet 0's two presets remain untouched.
  • Reset all rows to `isAggregator: false` first, then run → exactly one random pick per subnet, distinct families.
  • `bash -n spin-node.sh` clean.

Output of the new selection box (case 1):

```
╔══════════════════════════════════════════════════════════════╗
║ 🗳 Aggregator Selection ║
╠══════════════════════════════════════════════════════════════╣
║ subnet 0 → zeam_0, grandine_3 ║
║ subnet 1 → ethlambda_4 ║
║ subnet 2 → qlean_2 ║
╚══════════════════════════════════════════════════════════════╝
```

The aggregator selection block previously enforced "exactly one aggregator
per subnet": it kept only the first preset found per subnet, reset every
isAggregator flag to false up front, and aborted the deploy if it ever
saw more than one true row in a subnet. Operators who hand-edited the
YAML for redundancy (e.g. zeam_0 + grandine_0 on subnet 0) silently lost
their second aggregator on the next full deploy.

Change the policy to:

  1. --aggregator NODE: NODE is the sole aggregator for its subnet
     (presets on that subnet are discarded). Other subnets fall through.
  2. Otherwise, every preset (isAggregator: true row) in the YAML is
     honoured. Multiple aggregators per subnet are intentional and
     supported for redundancy.
  3. Subnets with no presets get one random pick, preferring client
     families not already aggregating elsewhere — same family-uniqueness
     heuristic as before.

The verification invariant relaxes from `==1` to `>=1`. Downstream
client cmds and ansible roles already read isAggregator per node, so
multiple true rows naturally translate into multiple `--is-aggregator`
container flags without further changes.

The YAML mutation is now a single pass (set true iff selected, else
false) instead of reset-then-set, avoiding the transient state where
no node carries the flag.

The inline "Aggregator Selection" box now lists all aggregators per
subnet, e.g. "subnet 0 → zeam_0, grandine_0".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant