fix: align attestation handling with leanSpec by GrapeBaBa · Pull Request #754 · blockblaz/zeam

GrapeBaBa · 2026-04-17T10:34:11Z

Summary

Four fixes to align attestation and block handling with leanSpec.

1. Switch gossip-attestation future-slot bound to interval-grained

zeam's gossip-attestation time check was slot-grained — data.slot <= current_slot + 1. A whole-slot tolerance lets an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5).

This PR adopts the interval-grained rule, expressed against the forkchoice store's interval clock:

data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

with a new GOSSIP_DISPARITY_INTERVALS = 1 constant. One interval bounds the head start to NTP drift. Block-included attestations skip the time check entirely; they are trusted under the block's signature + STF validation.

2. Lazy proto-array prune on finalization (root-cause fix)

chain.processFinalizationAdvancement called forkChoice.rebase(...) unconditionally on every finalization advance. Rebase removes pre-finalized ancestors from proto-array and remaps attestation-tracker indices, so any in-flight attestation whose source / target / head still referenced one of those ancestors failed validation with Unknown{Source,Target,Head}Block — even though the vote was valid at sign time.

3SF-mini's fast finalization cadence puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up (which can advance finalization multiple steps per tick) makes it dramatically worse, multiplying BlocksByRoot re-fetch storms and delaying justification.

Fix: gate the rebase call on the finalized node's position in proto-array. While that index is below constants.PRUNE_NODE_THRESHOLD = 64 (≈256 s grace at SECONDS_PER_SLOT=4), skip the rebuild and leave the pre-finalized prefix in place so in-flight attestations still resolve. Once the threshold is crossed the full rebase fires and amortises the vec rebuild + tracker remap across many finalizations.

Because the root cause is fixed at the prune site, the stricter re-validation in onGossipAggregatedAttestation is also restored — earlier revisions of this PR had skipped it as a tolerance workaround; with the grace window in place it's safe again and catches malformed aggregates that would otherwise slip through.

3. Drop zeam-specific block-level future-slot rejection and time-based block queue

Two pieces were entangled and are removed together:

Block-level future-slot rejection — validateBlock (and the entry-point check onBlock added in an earlier revision of this PR) rejected block.slot > current_slot + MAX_FUTURE_SLOT_TOLERANCE as BlockValidationError.FutureSlot. The MAX_FUTURE_SLOT_TOLERANCE constant was originally introduced for attestation validation and later reused at the block layer; that reuse never had a corresponding spec rule. Block admission is now gated by parent / signature / STF only.

Time-based block queue — zeam previously queued future-slot blocks on a local wall clock and replayed them on onInterval. The replay path was already removed in an earlier revision of this PR; what remained (cacheFutureBlock, the two error.FutureSlot handlers in node.zig) was a graveyard for blocks that would never come back. Both are gone now. Blocks within the parent / signature / STF gate process inline; anything beyond is cached in node.zig for later retry, identical to the rest of the ecosystem.

The MAX_FUTURE_SLOT_TOLERANCE constant and BlockValidationError.FutureSlot variant are removed.

4. Fill in missing leanSpec attestation topology / consistency checks

validateAttestationData was missing two rules from leanSpec validate_attestation:

Topology — data.head.slot >= data.target.slot. The head checkpoint must not be older than the target; without it, malformed attestations with a stale head pointer pass through zeam.
Consistency — head_block.slot == data.head.slot. The checkpoint slot must match the slot of the block it points at; zeam used to look up head_block and immediately drop it with _ = head_block; // Will be used in future validations, so a mismatched head slot silently passed.

Added as AttestationValidationError.HeadOlderThanTarget and HeadCheckpointSlotMismatch. Caught by the leanSpec fork-choice fixtures test_attestation_head_older_than_target_rejected and test_aggregated_attestation_head_slot_mismatch_rejected.

Copilot

Pull request overview

Aligns attestation “future slot” validation tolerance with leanSpec by allowing a +1 slot clock-disparity margin for all attestations (gossip and block).

Changes:

Unifies attestation future-slot validation to allow current_slot + MAX_FUTURE_SLOT_TOLERANCE regardless of attestation origin.
Updates the “gossip vs block future slot handling” test assertions to reflect the unified tolerance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T10:37:03Z

+        //    Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
+        //    regardless of whether the attestation is from gossip or a block.
        const current_slot = self.forkChoice.getCurrentSlot();
-        const max_allowed_slot = if (is_from_block)
-            current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
-        else
-            current_slot; // Gossip attestations: no future slots allowed
+        const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;


The function doc comment above validateAttestationData still describes different future-slot tolerances for gossip vs block attestations, but the implementation now applies the same current_slot + MAX_FUTURE_SLOT_TOLERANCE rule regardless of is_from_block. Update the doc comment to match the new unified behavior (and keep is_from_block description focused on logging/metrics if that’s its remaining purpose).

Copilot · 2026-04-17T10:37:04Z

+    // Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
+    // Both gossip and block attestations allow current_slot + 1
+    try beam_chain.validateAttestationData(next_slot_attestation.message, false);
    try beam_chain.validateAttestationData(next_slot_attestation.message, true);


With the new current_slot + 1 tolerance applied to gossip attestations, the earlier test attestation validation - comprehensive still expects AttestationTooFarInFuture for an attestation at current_slot + 1 (slot=3 when current is 2). That expectation should be updated (and the failing case shifted to current_slot + 2) to keep the suite consistent with the new validation rule.

Copilot · 2026-04-17T10:37:04Z

+    // Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
+    // Both gossip and block attestations allow current_slot + 1


This test’s name and header comment still say gossip and block attestations have different future-slot tolerances, but the assertions now verify unified behavior. Rename the test and update the top-of-test comments to avoid misleading future maintainers.

Suggested change

// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)

// Both gossip and block attestations allow current_slot + 1

// Per leanSpec store.py:320: attestation data is valid up to current_slot + Slot(1).

// This test verifies the unified behavior for both validation modes.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T14:43:27Z

+        // Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
+        // Attestation data was already validated when individual gossip attestations arrived.
+        // Re-validating here would fail after finalization prunes the source block from protoarray.


The comment says gossip aggregated attestations are no longer validated, but the onGossip() handler for .aggregation still calls validateAttestationData(signed_aggregation.data, false) before invoking this function (chain.zig around line ~776). That means aggregations can still fail with UnknownSourceBlock after finalization pruning, so the intended behavior in the PR description isn’t actually achieved. Either remove/relax that upstream validation for aggregations, or update validateAttestationData to tolerate pruned source blocks (e.g., allow missing source root when data.source.slot <= finalized_slot) so aggregation processing matches leanSpec and the PR’s stated fix.

Copilot · 2026-04-17T14:43:27Z

+        // Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
+        // Attestation data was already validated when individual gossip attestations arrived.
+        // Re-validating here would fail after finalization prunes the source block from protoarray.

        try self.verifyAggregatedAttestation(signedAggregation);



There’s no test covering the second behavior described in the PR (aggregations remaining valid when the source block has been pruned after finalization). Given this change alters aggregation processing rules, please add a unit/integration test that simulates finalization advancing/pruning the protoarray and then ensures a gossip aggregated attestation referencing a pruned-but-finalized source does not fail with UnknownSourceBlock.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-20T15:58:01Z

+                    try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
+                }
+                // else: threshold not reached; keep pre-finalized ancestors
+                // in proto-array so in-flight attestations still resolve.


getProtoNodeIndex(latestFinalized.root) should be non-null here because getCanonicalViewAndAnalysis already requires latestFinalized.root to exist in protoArray.indices. Silently skipping the rebase when it returns null would hide an invariant violation and can lead to unbounded proto-array growth. Consider turning this into an explicit error/log (and/or falling back to rebase/returning an error) if the index lookup fails.

Suggested change

// in proto-array so in-flight attestations still resolve.

// in proto-array so in-flight attestations still resolve.

} else {

std.debug.panic(

"forkchoice invariant violated: finalized root missing from proto-array during rebase: {}",

.{latestFinalized.root},

);

Copilot · 2026-04-20T15:58:02Z

+        // Validate attestation data (same rules as individual gossip attestations).
+        // The earlier revision of this function skipped re-validation because the
+        // source block could vanish from proto-array when finalization advanced
+        // between publish and receipt. That race is now handled at the root —
+        // processFinalizationAdvancement gates the rebase call on
+        // PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks
+        // addressable for the full grace window — so the stricter check is
+        // safe again.
        try self.validateAttestationData(signedAggregation.data, false);



onGossip already calls validateAttestationData(signed_aggregation.data, false) immediately before calling onGossipAggregatedAttestation. Re-validating here duplicates the proto-array lookups and log spam without adding safety (the BeamNode mutex already serializes onGossip vs onInterval). Consider removing this second validation (or refactoring so only one layer validates and is responsible for returning missing_attestation_roots).

Suggested change

// Validate attestation data (same rules as individual gossip attestations).

// The earlier revision of this function skipped re-validation because the

// source block could vanish from proto-array when finalization advanced

// between publish and receipt. That race is now handled at the root —

// processFinalizationAdvancement gates the rebase call on

// PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks

// addressable for the full grace window — so the stricter check is

// safe again.

try self.validateAttestationData(signedAggregation.data, false);

// Validation is done upstream in onGossip before this function is called.

Copilot · 2026-04-20T15:58:02Z

+// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
+// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
+// coverage and memory footprint.


The comment claiming “Lighthouse uses 256 for mainnet (one epoch of slots)” appears inaccurate (Ethereum mainnet epoch = 32 slots). Please correct the reference (either the slot count, or clarify what 256 represents) so readers don’t infer an incorrect epoch length from this constant’s docs.

Suggested change

// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window

// makes 64 (≈1 eth epoch of wall clock) the better trade between grace

// coverage and memory footprint.

// slots for mainnet as a pruning threshold; this is not the Ethereum epoch

// length (mainnet epochs are 32 slots). 3SF-mini's shorter finalization

// window makes 64 (≈1 eth epoch of wall clock) the better trade between

// grace coverage and memory footprint.

Copilot · 2026-04-20T15:58:02Z

+        // 5 Rebase forkchoice — lazy prune with node-count threshold.
+        //
+        // Eager rebase drops pre-finalized ancestors from proto-array and
+        // remaps attestation-tracker indices. In-flight attestations whose
+        // source/target/head still points at one of those ancestors then
+        // fail the existence checks in validateAttestationData with
+        // Unknown{Source,Target,Head}Block, and the node burns bandwidth
+        // re-fetching blocks that will never come back. The grace window
+        // must outlast the worst-case gossip delay plus at least one
+        // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 gives
+        // roughly 1 eth epoch of wall clock, comfortably beyond both.


PR description says the fix is to “accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray.” The implementation here instead delays forkChoice.rebase via PRUNE_NODE_THRESHOLD, but validateAttestationData still hard-fails when a referenced root is absent. Please either update the PR description to match the implemented approach, or implement the described <= finalized_slot exception so behavior matches the summary.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T03:39:45Z

+        // 5 Rebase forkchoice — lazy prune with node-count threshold.
+        //
+        // Eager rebase drops pre-finalized ancestors from proto-array and
+        // remaps attestation-tracker indices. In-flight attestations whose
+        // source/target/head still points at one of those ancestors then
+        // fail the existence checks in validateAttestationData with
+        // Unknown{Source,Target,Head}Block, and the node burns bandwidth
+        // re-fetching blocks that will never come back. The grace window
+        // must outlast the worst-case gossip delay plus at least one
+        // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
+        // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
+        if (pruneForkchoice) {
+            if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
+                if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
+                    try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
+                }
+                // else: threshold not reached; keep pre-finalized ancestors
+                // in proto-array so in-flight attestations still resolve.
+            }


The PR description says pruned source blocks should be accepted when their slot <= finalized_slot even if absent from proto-array, but the change here instead delays forkchoice.rebase via a node-count threshold. If the intended behavior is to accept attestations referencing already-finalized (and potentially pruned) source/target/head, that logic is not implemented in validateAttestationData. Please either adjust the PR description to match the threshold-based approach, or implement the finalized-slot tolerance in validation.

Copilot · 2026-04-21T03:39:45Z

@@ -2390,35 +2411,13 @@ test "attestation validation - gossip vs block future slot handling" {
        .signature = ZERO_SIGBYTES,
    };

-    // Gossip attestations: should FAIL for next slot (current + 1)
+    // Gossip attestations: should FAIL for slot current + 2
    // Per spec store.py:177: assert attestation.slot <= time_slots
    try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, false));

-    // Block attestations: should PASS for next slot (current + 1)
+    // Block attestations: should FAIL for slot current + 2
    // Per spec store.py:140: assert attestation.slot <= Slot(current_slot + Slot(1))
-    try beam_chain.validateAttestationData(next_slot_attestation.message, true);
-    const too_far_attestation: types.SignedAttestation = .{
-        .validator_id = 0,
-        .message = .{
-            .slot = 3, // Too far in future
-            .head = types.Checkpoint{
-                .root = mock_chain.blockRoots[1],
-                .slot = 1,
-            },
-            .source = types.Checkpoint{
-                .root = mock_chain.blockRoots[0],
-                .slot = 0,
-            },
-            .target = types.Checkpoint{
-                .root = mock_chain.blockRoots[1],
-                .slot = 1,
-            },
-        },
-        .signature = ZERO_SIGBYTES,
-    };
-    // Both should fail for slot 3 when current is slot 1
-    try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, false));
-    try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, true));
+    try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, true));


The test name/comments still describe “gossip: <= current_slot; block: <= current_slot + 1”, but the code now allows +1 for both. Also, this test only checks that current+2 fails; it no longer asserts that current+1 succeeds for gossip (the behavior this PR changes). Update the assertions and comments so the test actually covers the new tolerance rule.

g11tech · 2026-04-21T11:28:12Z

+        // must outlast the worst-case gossip delay plus at least one
+        // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
+        // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
+        if (pruneForkchoice) {


why would rebase drop pre finalized, we only rebase with latest finalized?

ahh ok got the context from the comment, never-mind

g11tech · 2026-04-21T11:31:15Z

-            current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
-        else
-            current_slot; // Gossip attestations: no future slots allowed
+        const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;


I don't understand how this is possible, block is processed only if block slot <= current slot and the attestation's in it can't be of a slot future than the block slot

validateBlock already allows block.slot <= current_slot + MAX_FUTURE_SLOT_TOLERANCE (+1), so even for block-embedded attestations the effective upper bound on att.slot is current_slot + 1 — the +1 tolerance here is needed on the block path too.

As an aggregator, this call site handles is_from_block=false — the gossip path, which is independent of any block arrival. The race is clock skew, not block processing:

A remote validator signs a gossip attestation at their local slot N (interval 1) with data.slot = N.

It propagates to us. If our clock trails the validator's (normal in any distributed system), we can still be at slot N-1 when the message lands.

validateAttestationData(..., false) then sees att.slot=N > current_slot=N-1. Without +1 tolerance, we'd drop a perfectly honest vote as "too far in the future".

also removed the previous pending block processing

leanSpec store.py:320 allows current_slot + 1 for all attestations (gossip and block) as clock disparity tolerance. zeam previously only allowed +1 for block attestations and rejected gossip attestations for future slots. Align with leanSpec's unified behavior.

When finalization advances, old blocks are pruned from the protoarray. Aggregations referencing a source checkpoint that was valid when the attestation was created can fail validation if finalization pruned the source block between attestation creation and aggregation publishing. Accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray — they were canonical and are now below the finalization horizon. Root cause: race between finalization pruning (interval 0 of new slot) and aggregation publishing (interval 2 of same slot), where the aggregation's attestation data references a justified checkpoint that got pruned during the finalization advance.

…ation Per leanSpec store.py on_gossip_aggregated_attestation: aggregated attestations only verify the aggregated signature proof, they do NOT call validate_attestation on the attestation data. nlean follows the same pattern. The attestation data was already validated when individual gossip attestations arrived. Re-validating in the aggregation path caused UnknownSourceBlock errors when finalization pruned the source block from protoarray between attestation creation and aggregation publishing. This also reverts the "tolerate pruned source blocks" workaround in validateAttestationData since the root cause is now fixed.

Root-cause the attestation-race previously worked around in chain.onGossipAggregatedAttestation and validateAttestationData. Symptom: every time finalization advanced, processFinalizationAdvancement called forkChoice.rebase(...) unconditionally. Rebase drops pre-finalized ancestors from proto-array and remaps attestation-tracker indices. Any in-flight attestation whose source / target / head referenced one of those dropped blocks then failed validateAttestationData's existence check with Unknown{Source,Target,Head}Block — even though the vote was perfectly valid when the validator produced it. 3SF-mini's fast finalization cadence (one advance every few slots) puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up, which can advance finalization multiple steps in a single tick, makes it much worse. Fix — lighthouse-style lazy prune: * Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses 256 for mainnet; 3SF-mini's shorter wall-clock per finalization step makes 64 (~1 eth epoch worth of grace) the better trade between race coverage and memory footprint. * Gate the rebase call in processFinalizationAdvancement on the finalized node's index in the proto-array. While that index is below the threshold, skip rebase and leave the pre-finalized prefix addressable so in-flight attestations still resolve. * Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe wrapper over protoArray.indices). Revert of the prior tolerance workaround: * onGossipAggregatedAttestation now re-validates the attestation data again. The "skip validate" workaround exists only because source blocks could vanish mid-finalization; with the threshold gate keeping them in place, the stricter check is safe again and catches malformed aggregates that used to slip through. Test: * Add regression test driving the mock chain with default onBlock opts (pruneForkchoice = true) through 5 slots. The finalized node's index stays well under the threshold, so pre-finalized ancestors must remain resolvable via getProtoNode after finalization advances. Before this fix, the rebase call would have removed roots[1] and roots[2] (slots < latestFinalized.slot) from the protoArray and the assertion would trip. * Update the existing "Test 9: Attestation too far in future" case to use slot=current+2 — the spec-aligned +1 tolerance change from 09712ac made the prior slot=current+1 trip point pass validation; the test was not updated at the time. The future-slot +1 tolerance fix (09712ac) is orthogonal and preserved — spec alignment with store.py:320.

zeam was the only client among leanSpec / ream / qlean-mini / ethlambda / nlean that queued inbound blocks on the local clock. Every other implementation processes a block the moment its parent state is known — their only "pending" caches are for orphan blocks (parent unknown, awaiting backfill), never for blocks whose slot just hasn't started yet locally. The queue also kept block body attestations out of fork choice until the next interval tick, delaying head updates by a full interval and duplicating the future-slot rejection that already lives in chain.validateBlock. Changes: * Remove BeamChain.pending_blocks field, its init / deinit hooks, and the append site on the gossip path. * Remove BeamChain.processPendingBlocks (replay helper) and its caller in node.onInterval. * Drop the duplicate future-slot gate in forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1, matching leanSpec store.py:320); forkchoice now mirrors leanSpec store.on_block and only requires a known parent plus non-pre-finalized slot. * Remove ForkChoiceError.FutureSlot — no caller returns it. * Update the fork-choice block-tree unit test: the old test drove "reject future slot" against forkchoice.onBlock; that responsibility has moved up to chain.validateBlock, so the test now just ticks the clock and processes the block. validateBlock still rejects block.slot > current_slot + 1 with BlockValidationError.FutureSlot; node.zig's handler for error.FutureSlot still caches those blocks for later retry, so the "block arrived genuinely too early" case is covered exactly as before — just without the redundant inner-layer queue.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T09:16:43Z

                // Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
                self.sweepTimedOutRequests();



After removing the forkchoice FutureSlot gate (and the pending-block replay), BeamChain.onBlock no longer returns error.FutureSlot (it only returns BlockValidationError.FutureSlot from validateBlock, which is used in the gossip path). However, processCachedDescendants in this file still checks err == error.FutureSlot when calling self.chain.onBlock(...), which will now be unreachable and likely fail to compile because FutureSlot is no longer in the inferred error set. Please remove that branch or perform an explicit chain.validateBlock check before calling onBlock for cached blocks and handle BlockValidationError.FutureSlot there.

Copilot · 2026-04-22T09:16:44Z

+        if (pruneForkchoice) {
+            if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
+                if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
+                    try self.forkChoice.rebase(latestFinalized.root, &canonical_view);


The new prune gate skips rebase when getProtoNodeIndex(latestFinalized.root) returns null. Previously this would have surfaced as ForkChoiceError.InvalidTargetAnchor; now it silently does nothing, which can mask a forkchoice inconsistency and also prevent pruning from ever running (proto-array growth) if that situation occurs. Consider treating a null index as an error (or at least logging loudly) rather than silently skipping rebase.

…anspec

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

leanSpec `store.py:validate_attestation` enforces two topology / consistency rules that zeam had been skipping: - `data.head.slot >= data.target.slot` — head checkpoint cannot be older than the target (topology check, store.py:304). - `head_block.slot == data.head.slot` — head checkpoint slot must match the actual block slot the checkpoint points at (consistency check, store.py:314). Without these, malformed gossip / aggregated attestations with a stale or mismatched head pointer slipped through zeam's validation while leanSpec and the rest of the ecosystem reject them. Add the two checks with matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error variants and remove the stale `_ = head_block; // Will be used in future` placeholder now that the node is actually consulted.

The block explained why re-validation was restored — context that is meaningful in a PR description but rots as a source comment: future readers have no "earlier revision" to compare against, and the rebase gating it references is documented at the rebase site itself.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/node.zig:1197

After removing processPendingBlocks, cached “future-slot” blocks are only retried via processReadyCachedBlocks(slot). That helper currently considers blocks “ready” only when block.slot <= current_slot (see processReadyCachedBlocks in this file), which delays retry by ~1 slot relative to MAX_FUTURE_SLOT_TOLERANCE (since a block becomes valid once block.slot <= current_slot + 1). Consider adjusting the readiness condition (or the slot passed in) so cached future blocks are retried as soon as they fall within the +1 tolerance window.

                // Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
                self.sweepTimedOutRequests();

                self.processReadyCachedBlocks(slot);
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T13:52:17Z

+        const signed_block = mock_chain.blocks[i];
+        const current_slot = signed_block.block.slot;
+        try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
+        const missing_roots = try beam_chain.onBlock(signed_block, .{});
+        allocator.free(missing_roots);


This regression test doesn’t currently exercise processFinalizationAdvancement/rebase behavior: it calls beam_chain.onBlock(...) but never calls beam_chain.onBlockFollowup(...) (which is where finalization advancement triggers the gated rebase). As written, it would also pass on the pre-fix code because no rebase occurs. Consider invoking onBlockFollowup(true, &signed_block) after each imported block (or driving the chain through the same entrypoints node uses) so the test actually fails without the threshold gate.

Suggested change

const signed_block = mock_chain.blocks[i];

const current_slot = signed_block.block.slot;

try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);

const missing_roots = try beam_chain.onBlock(signed_block, .{});

allocator.free(missing_roots);

var signed_block = mock_chain.blocks[i];

const current_slot = signed_block.block.slot;

try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);

const missing_roots = try beam_chain.onBlock(signed_block, .{});

allocator.free(missing_roots);

try beam_chain.onBlockFollowup(true, &signed_block);

Copilot · 2026-04-23T13:52:18Z

+            // The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)
+            // used to live here but duplicated the gossip-level check in
+            // chain.validateBlock and disagreed with every other leanSpec
+            // consumer (ream, qlean-mini, ethlambda, nlean — none of which
+            // key block admission on the local clock). Rejecting here also
+            // caused chain.zig to keep a parallel pending-block queue just
+            // to replay these blocks on the next interval, which meant the
+            // block / its body attestations couldn't contribute to fork
+            // choice weight until the clock caught up. Now we mirror
+            // leanSpec store.on_block: if parent is known and the slot is
+            // not pre-finalized, just process it.


Removing the forkchoice-level future-slot gate means some block-processing paths can now import blocks that are far beyond MAX_FUTURE_SLOT_TOLERANCE. In particular, the RPC/sync path calls BeamChain.onBlock(...) directly (e.g. pkgs/node/src/node.zig:677) without running chain.validateBlock, so ForkChoice.onBlock will accept and fully process arbitrarily-future blocks as long as the parent is known. Consider enforcing the same future-slot bound for all imports (e.g. call validateBlock inside BeamChain.onBlock, or validate in the RPC handler) so malicious peers can’t drive expensive STF/forkchoice work with far-future blocks.

Suggested change

// The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)

// used to live here but duplicated the gossip-level check in

// chain.validateBlock and disagreed with every other leanSpec

// consumer (ream, qlean-mini, ethlambda, nlean — none of which

// key block admission on the local clock). Rejecting here also

// caused chain.zig to keep a parallel pending-block queue just

// to replay these blocks on the next interval, which meant the

// block / its body attestations couldn't contribute to fork

// choice weight until the clock caught up. Now we mirror

// leanSpec store.on_block: if parent is known and the slot is

// not pre-finalized, just process it.

// Do not reintroduce the old strict local-clock forkchoice gate

// here; it caused direct-import paths to maintain separate

// pending queues and delayed useful forkchoice participation for

// slightly-future blocks. However, we still need the same bounded

// future-slot protection that validateBlock applies so all import

// paths reject arbitrarily far-future blocks before doing

// expensive STF / forkchoice work.

if (slot > opts.currentSlot + constants.MAX_FUTURE_SLOT_TOLERANCE) {

return ForkChoiceError.FutureSlot;

}

Copilot · 2026-04-23T13:52:18Z

        //
-        //    Gossip attestations must be for current or past slots only. Validators attest
-        //    in interval 1 of the current slot, so they cannot attest for future slots.
-        //    Block attestations can be more lenient since the block itself was validated.
+        //    Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
+        //    regardless of whether the attestation is from gossip or a block.
        const current_slot = self.forkChoice.getCurrentSlot();
-        const max_allowed_slot = if (is_from_block)
-            current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
-        else
-            current_slot; // Gossip attestations: no future slots allowed
+        const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;


validateAttestationData now allows current_slot + MAX_FUTURE_SLOT_TOLERANCE for both gossip and block attestations, but the function header comment still states that gossip attestations have no future tolerance. Update the doc comment above validateAttestationData to match the new behavior/spec alignment.

Copilot · 2026-04-23T13:52:18Z

+        // must outlast the worst-case gossip delay plus at least one
+        // finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
+        // (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
+        if (pruneForkchoice) {
+            if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
+                if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
+                    try self.forkChoice.rebase(latestFinalized.root, &canonical_view);


This comment says PRUNE_NODE_THRESHOLD = 64 slots, but the gate uses finalized_idx (proto-array node index), which is not necessarily equal to slot distance (multiple blocks/forks can inflate indices). Either adjust the wording here or gate on slot distance if the intent is a time-based grace window.

Five follow-ups from the review pass: - Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip path funnelled through chain.validateBlock; RPC handlers went straight to onBlock and would happily run STF + fork-choice on blocks arbitrarily far in the future, a cheap DoS vector. - Make the PRUNE_NODE_THRESHOLD regression test actually exercise the gate by calling onBlockFollowup (where processFinalizationAdvancement — and the rebase call it gates — fires). Previously the test only called onBlock and would have passed even without the fix. - Update validateAttestationData doc and `attestation validation - gossip vs block future slot handling` test to describe the unified `current_slot + 1` rule both paths now follow; the test is renamed and now asserts both the accepted current+1 case and the rejected current+2 case on both is_from_block values. - Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node count, not a slot distance. Under heavy forking, node count advances faster than slot distance, so the wall-clock grace can shrink. Updated the constants.zig comment and the rebase call site accordingly. - Log loudly instead of silently skipping the rebase when latestFinalized.root is missing from proto-array. Under normal operation it is always present (getCanonicalViewAndAnalysis resolves it via protoArray.indices); surfacing the invariant violation beats hiding it behind "nothing happened".

…ec rule Switch the gossip-attestation time check from slot-grained to interval-grained: data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS A whole-slot tolerance let an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift. Block-included attestations skip the time check; they are trusted under the block's signature + STF validation. The block path itself drops the slot-grained future-slot bound entirely — block admission is gated by parent / signature / STF only. Removed: - constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation validation and validateBlock / onBlock; the block usages had no spec basis to begin with) - chain.validateBlock future-slot rejection - chain.onBlock entry-point future-slot rejection - chain.validateAttestationData block-path slot bound - BlockValidationError.FutureSlot variant - node.zig error.FutureSlot handlers and cacheFutureBlock helper (the cacheFutureBlock path had no replay mechanism after the time-based block queue was removed earlier in this PR) Added: - constants.GOSSIP_DISPARITY_INTERVALS = 1 - chain.validateAttestationData gossip-only interval bound - chain.zig boundary tests covering the gossip / block divergence

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/chain.zig:1324

validateBlock no longer enforces any upper bound on block.slot relative to the local clock. Because onGossip runs signature verification + STF before adding to forkchoice, a far-future (but validly signed) block can force process_slots to iterate up to that slot, causing excessive work. Add back a future-slot limit (the intended +1 tolerance) here so blocks that are too far ahead are rejected early and cheaply.

    pub fn validateBlock(self: *Self, block: types.BeamBlock, is_from_gossip: bool) !void {
        _ = is_from_gossip;

        const finalized_slot = self.forkChoice.fcStore.latest_finalized.slot;

        // 1. Pre-finalized slot check - reject blocks before finalized slot
        if (block.slot < finalized_slot) {
            self.logger.debug("block validation failed: pre-finalized slot {d} < finalized {d}", .{
                block.slot,
                finalized_slot,
            });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T03:55:42Z

+            // Block admission only requires a known parent and a slot above
+            // the finalized boundary; STF and signature verification are the
+            // gating layers.
+            if (slot < self.fcStore.latest_finalized.slot) {
                return ForkChoiceError.PreFinalizedSlot;
            }


The FutureSlot gate was removed from ForkChoice.onBlockUnlocked, so a validly-signed block with a far-future slot can now be admitted and then drive an unbounded BeamState.process_slots loop during STF (see pkgs/types/src/state.zig:256+), leading to potential CPU/latency DoS. Reintroduce an upper-bound future-slot check (with the intended +1 tolerance) before expensive work, and return a dedicated error (or reuse an existing one) so callers can cache/ignore blocks that are too far ahead.

Copilot · 2026-04-29T03:55:42Z

+// Future-slot tolerance for gossip attestations, measured in intervals:
+//
+//     data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS
+//
+// where store.time is in intervals. One interval is roughly 800 ms at
+// SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5.
+//
+// A whole-slot tolerance would let an adversary pre-publish next-slot
+// aggregates ahead of any honest validator (~800 ms head start at 4 s
+// slots); tightening to one interval bounds that head start to NTP drift.
+//
+// Block-included attestations skip this check entirely; they are trusted
+// under the block's own validation.
+pub const GOSSIP_DISPARITY_INTERVALS = 1;


PR description (section 1) states leanSpec allows data.slot <= current_slot + 1 for all attestations as clock-disparity tolerance. The implementation here instead uses an interval-grained gossip-only bound (GOSSIP_DISPARITY_INTERVALS = 1) and explicitly skips the time check for block-included attestations. Either update the PR description to match this behavior, or adjust the constants/validation logic to match the stated leanSpec rule so reviewers/operators aren't relying on incorrect semantics.

Copilot · 2026-04-29T03:55:43Z

+    try beam_chain.forkChoice.onInterval(boundary_time, false);
+
+    // At the boundary the gossip path admits the same vote.
+    try beam_chain.validateAttestationData(next_slot_attestation.message, false);


The new boundary-acceptance behavior is only unit-tested via direct calls to validateAttestationData. Add an integration-style assertion that a boundary-admitted gossip attestation can be processed through the normal path (e.g. onGossipAttestation / forkChoice.onSignedAttestation) without being rejected downstream, to prevent regressions where validation and forkchoice admission rules diverge.

Suggested change

try beam_chain.validateAttestationData(next_slot_attestation.message, false);

try beam_chain.validateAttestationData(next_slot_attestation.message, false);

// Ensure the normal downstream path also accepts the boundary-admitted gossip attestation.

try beam_chain.forkChoice.onSignedAttestation(next_slot_attestation);

…anspec

Copilot AI review requested due to automatic review settings April 17, 2026 10:34

Copilot started reviewing on behalf of GrapeBaBa April 17, 2026 10:34 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 17, 2026 14:39

Copilot started reviewing on behalf of GrapeBaBa April 17, 2026 14:39 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 20, 2026 15:53

GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 7edeb6a to 963f6e0 Compare April 20, 2026 15:53

Copilot started reviewing on behalf of GrapeBaBa April 20, 2026 15:53 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 963f6e0 to 827a0ef Compare April 21, 2026 03:27

Copilot AI review requested due to automatic review settings April 21, 2026 03:36

GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 827a0ef to c2b3e06 Compare April 21, 2026 03:36

Copilot started reviewing on behalf of GrapeBaBa April 21, 2026 03:36 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

GrapeBaBa requested review from anshalshukla and g11tech April 21, 2026 04:37

g11tech reviewed Apr 21, 2026

View reviewed changes

GrapeBaBa added 5 commits April 22, 2026 17:04

Copilot AI review requested due to automatic review settings April 22, 2026 09:09

GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 896043e to 82115ac Compare April 22, 2026 09:09

Copilot started reviewing on behalf of GrapeBaBa April 22, 2026 09:09 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

GrapeBaBa requested a review from g11tech April 22, 2026 09:23

Merge branch 'main' into fix/align-attestation-slot-tolerance-with-le…

68aa7f9

…anspec

Merge branch 'main' into fix/align-attestation-slot-tolerance-with-le…

fe64557

…anspec

Copilot AI review requested due to automatic review settings April 23, 2026 09:40

Copilot AI reviewed Apr 23, 2026

View reviewed changes

GrapeBaBa changed the title ~~fix: align attestation future slot tolerance with leanSpec~~ fix: align attestation handling with leanSpec Apr 23, 2026

Copilot AI review requested due to automatic review settings April 23, 2026 13:43

Copilot started reviewing on behalf of GrapeBaBa April 23, 2026 13:44 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

GrapeBaBa added 2 commits April 23, 2026 22:38

Copilot AI review requested due to automatic review settings April 29, 2026 03:48

Copilot started reviewing on behalf of GrapeBaBa April 29, 2026 03:48 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

GrapeBaBa added 2 commits April 29, 2026 11:56

Merge branch 'main' into fix/align-attestation-slot-tolerance-with-le…

ffebddb

…anspec

Merge branch 'main' into fix/align-attestation-slot-tolerance-with-le…

22b14c9

…anspec

		// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
		// Both gossip and block attestations allow current_slot + 1

-                // in proto-array so in-flight attestations still resolve.
+                // in proto-array so in-flight attestations still resolve.
+            } else {
+                std.debug.panic(
+                    "forkchoice invariant violated: finalized root missing from proto-array during rebase: {}",
+                    .{latestFinalized.root},
+                );

-// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
-// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
-// coverage and memory footprint.
+// slots for mainnet as a pruning threshold; this is not the Ethereum epoch
+// length (mainnet epochs are 32 slots). 3SF-mini's shorter finalization
+// window makes 64 (≈1 eth epoch of wall clock) the better trade between
+// grace coverage and memory footprint.

		// Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
		self.sweepTimedOutRequests();

Conversation

GrapeBaBa commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Switch gossip-attestation future-slot bound to interval-grained

2. Lazy proto-array prune on finalization (root-cause fix)

3. Drop zeam-specific block-level future-slot rejection and time-based block queue

4. Fill in missing leanSpec attestation topology / consistency checks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

g11tech Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

g11tech Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GrapeBaBa Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

GrapeBaBa Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

GrapeBaBa commented Apr 17, 2026 •

edited

Loading

g11tech Apr 21, 2026 •

edited

Loading

g11tech Apr 21, 2026 •

edited

Loading