Skip to content

fix: align attestation handling with leanSpec#754

Open
GrapeBaBa wants to merge 13 commits intomainfrom
fix/align-attestation-slot-tolerance-with-leanspec
Open

fix: align attestation handling with leanSpec#754
GrapeBaBa wants to merge 13 commits intomainfrom
fix/align-attestation-slot-tolerance-with-leanspec

Conversation

@GrapeBaBa
Copy link
Copy Markdown
Member

@GrapeBaBa GrapeBaBa commented Apr 17, 2026

Summary

Four fixes to align attestation and block handling with leanSpec.

1. Switch gossip-attestation future-slot bound to interval-grained

zeam's gossip-attestation time check was slot-grained — data.slot <= current_slot + 1. A whole-slot tolerance lets an adversary pre-publish next-slot aggregates ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5).

This PR adopts the interval-grained rule, expressed against the forkchoice store's interval clock:

data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

with a new GOSSIP_DISPARITY_INTERVALS = 1 constant. One interval bounds the head start to NTP drift. Block-included attestations skip the time check entirely; they are trusted under the block's signature + STF validation.

2. Lazy proto-array prune on finalization (root-cause fix)

chain.processFinalizationAdvancement called forkChoice.rebase(...) unconditionally on every finalization advance. Rebase removes pre-finalized ancestors from proto-array and remaps attestation-tracker indices, so any in-flight attestation whose source / target / head still referenced one of those ancestors failed validation with Unknown{Source,Target,Head}Block — even though the vote was valid at sign time.

3SF-mini's fast finalization cadence puts the race window on the same order of magnitude as normal gossip propagation, so attestations drop across every finalization boundary under normal load. Sync catch-up (which can advance finalization multiple steps per tick) makes it dramatically worse, multiplying BlocksByRoot re-fetch storms and delaying justification.

Fix: gate the rebase call on the finalized node's position in proto-array. While that index is below constants.PRUNE_NODE_THRESHOLD = 64 (≈256 s grace at SECONDS_PER_SLOT=4), skip the rebuild and leave the pre-finalized prefix in place so in-flight attestations still resolve. Once the threshold is crossed the full rebase fires and amortises the vec rebuild + tracker remap across many finalizations.

Because the root cause is fixed at the prune site, the stricter re-validation in onGossipAggregatedAttestation is also restored — earlier revisions of this PR had skipped it as a tolerance workaround; with the grace window in place it's safe again and catches malformed aggregates that would otherwise slip through.

3. Drop zeam-specific block-level future-slot rejection and time-based block queue

Two pieces were entangled and are removed together:

Block-level future-slot rejectionvalidateBlock (and the entry-point check onBlock added in an earlier revision of this PR) rejected block.slot > current_slot + MAX_FUTURE_SLOT_TOLERANCE as BlockValidationError.FutureSlot. The MAX_FUTURE_SLOT_TOLERANCE constant was originally introduced for attestation validation and later reused at the block layer; that reuse never had a corresponding spec rule. Block admission is now gated by parent / signature / STF only.

Time-based block queue — zeam previously queued future-slot blocks on a local wall clock and replayed them on onInterval. The replay path was already removed in an earlier revision of this PR; what remained (cacheFutureBlock, the two error.FutureSlot handlers in node.zig) was a graveyard for blocks that would never come back. Both are gone now. Blocks within the parent / signature / STF gate process inline; anything beyond is cached in node.zig for later retry, identical to the rest of the ecosystem.

The MAX_FUTURE_SLOT_TOLERANCE constant and BlockValidationError.FutureSlot variant are removed.

4. Fill in missing leanSpec attestation topology / consistency checks

validateAttestationData was missing two rules from leanSpec validate_attestation:

  • Topologydata.head.slot >= data.target.slot. The head checkpoint must not be older than the target; without it, malformed attestations with a stale head pointer pass through zeam.
  • Consistencyhead_block.slot == data.head.slot. The checkpoint slot must match the slot of the block it points at; zeam used to look up head_block and immediately drop it with _ = head_block; // Will be used in future validations, so a mismatched head slot silently passed.

Added as AttestationValidationError.HeadOlderThanTarget and HeadCheckpointSlotMismatch. Caught by the leanSpec fork-choice fixtures test_attestation_head_older_than_target_rejected and test_aggregated_attestation_head_slot_mismatch_rejected.

Copilot AI review requested due to automatic review settings April 17, 2026 10:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns attestation “future slot” validation tolerance with leanSpec by allowing a +1 slot clock-disparity margin for all attestations (gossip and block).

Changes:

  • Unifies attestation future-slot validation to allow current_slot + MAX_FUTURE_SLOT_TOLERANCE regardless of attestation origin.
  • Updates the “gossip vs block future slot handling” test assertions to reflect the unified tolerance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1421 to +1424
// Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
// regardless of whether the attestation is from gossip or a block.
const current_slot = self.forkChoice.getCurrentSlot();
const max_allowed_slot = if (is_from_block)
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doc comment above validateAttestationData still describes different future-slot tolerances for gossip vs block attestations, but the implementation now applies the same current_slot + MAX_FUTURE_SLOT_TOLERANCE rule regardless of is_from_block. Update the doc comment to match the new unified behavior (and keep is_from_block description focused on logging/metrics if that’s its remaining purpose).

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 2312 to 2315
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
try beam_chain.validateAttestationData(next_slot_attestation.message, true);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new current_slot + 1 tolerance applied to gossip attestations, the earlier test attestation validation - comprehensive still expects AttestationTooFarInFuture for an attestation at current_slot + 1 (slot=3 when current is 2). That expectation should be updated (and the failing case shifted to current_slot + 2) to keep the suite consistent with the new validation rule.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +2312 to +2313
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s name and header comment still say gossip and block attestations have different future-slot tolerances, but the assertions now verify unified behavior. Rename the test and update the top-of-test comments to avoid misleading future maintainers.

Suggested change
// Per leanSpec store.py:320: assert data.slot <= current_slot + Slot(1)
// Both gossip and block attestations allow current_slot + 1
// Per leanSpec store.py:320: attestation data is valid up to current_slot + Slot(1).
// This test verifies the unified behavior for both validation modes.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 17, 2026 14:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1460 to +1462
// Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
// Attestation data was already validated when individual gossip attestations arrived.
// Re-validating here would fail after finalization prunes the source block from protoarray.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says gossip aggregated attestations are no longer validated, but the onGossip() handler for .aggregation still calls validateAttestationData(signed_aggregation.data, false) before invoking this function (chain.zig around line ~776). That means aggregations can still fail with UnknownSourceBlock after finalization pruning, so the intended behavior in the PR description isn’t actually achieved. Either remove/relax that upstream validation for aggregations, or update validateAttestationData to tolerate pruned source blocks (e.g., allow missing source root when data.source.slot <= finalized_slot) so aggregation processing matches leanSpec and the PR’s stated fix.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 1460 to 1465
// Per leanSpec: on_gossip_aggregated_attestation does NOT call validate_attestation.
// Attestation data was already validated when individual gossip attestations arrived.
// Re-validating here would fail after finalization prunes the source block from protoarray.

try self.verifyAggregatedAttestation(signedAggregation);

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no test covering the second behavior described in the PR (aggregations remaining valid when the source block has been pruned after finalization). Given this change alters aggregation processing rules, please add a unit/integration test that simulates finalization advancing/pruning the protoarray and then ensures a gossip aggregated attestation referencing a pruned-but-finalized source does not fail with UnknownSourceBlock.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 20, 2026 15:53
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 7edeb6a to 963f6e0 Compare April 20, 2026 15:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
}
// else: threshold not reached; keep pre-finalized ancestors
// in proto-array so in-flight attestations still resolve.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getProtoNodeIndex(latestFinalized.root) should be non-null here because getCanonicalViewAndAnalysis already requires latestFinalized.root to exist in protoArray.indices. Silently skipping the rebase when it returns null would hide an invariant violation and can lead to unbounded proto-array growth. Consider turning this into an explicit error/log (and/or falling back to rebase/returning an error) if the index lookup fails.

Suggested change
// in proto-array so in-flight attestations still resolve.
// in proto-array so in-flight attestations still resolve.
} else {
std.debug.panic(
"forkchoice invariant violated: finalized root missing from proto-array during rebase: {}",
.{latestFinalized.root},
);

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines 1539 to 1548
// Validate attestation data (same rules as individual gossip attestations).
// The earlier revision of this function skipped re-validation because the
// source block could vanish from proto-array when finalization advanced
// between publish and receipt. That race is now handled at the root —
// processFinalizationAdvancement gates the rebase call on
// PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks
// addressable for the full grace window — so the stricter check is
// safe again.
try self.validateAttestationData(signedAggregation.data, false);

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onGossip already calls validateAttestationData(signed_aggregation.data, false) immediately before calling onGossipAggregatedAttestation. Re-validating here duplicates the proto-array lookups and log spam without adding safety (the BeamNode mutex already serializes onGossip vs onInterval). Consider removing this second validation (or refactoring so only one layer validates and is responsible for returning missing_attestation_roots).

Suggested change
// Validate attestation data (same rules as individual gossip attestations).
// The earlier revision of this function skipped re-validation because the
// source block could vanish from proto-array when finalization advanced
// between publish and receipt. That race is now handled at the root —
// processFinalizationAdvancement gates the rebase call on
// PRUNE_NODE_THRESHOLD, keeping the source / target / head blocks
// addressable for the full grace window — so the stricter check is
// safe again.
try self.validateAttestationData(signedAggregation.data, false);
// Validation is done upstream in onGossip before this function is called.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/constants.zig Outdated
Comment on lines +37 to +39
// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
// coverage and memory footprint.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claiming “Lighthouse uses 256 for mainnet (one epoch of slots)” appears inaccurate (Ethereum mainnet epoch = 32 slots). Please correct the reference (either the slot count, or clarify what 256 represents) so readers don’t infer an incorrect epoch length from this constant’s docs.

Suggested change
// for mainnet (one epoch of slots); 3SF-mini's shorter finalization window
// makes 64 (≈1 eth epoch of wall clock) the better trade between grace
// coverage and memory footprint.
// slots for mainnet as a pruning threshold; this is not the Ethereum epoch
// length (mainnet epochs are 32 slots). 3SF-mini's shorter finalization
// window makes 64 (≈1 eth epoch of wall clock) the better trade between
// grace coverage and memory footprint.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1300 to +1310
// 5 Rebase forkchoice — lazy prune with node-count threshold.
//
// Eager rebase drops pre-finalized ancestors from proto-array and
// remaps attestation-tracker indices. In-flight attestations whose
// source/target/head still points at one of those ancestors then
// fail the existence checks in validateAttestationData with
// Unknown{Source,Target,Head}Block, and the node burns bandwidth
// re-fetching blocks that will never come back. The grace window
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 gives
// roughly 1 eth epoch of wall clock, comfortably beyond both.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the fix is to “accept source blocks whose slot <= finalized_slot as valid even when absent from the protoarray.” The implementation here instead delays forkChoice.rebase via PRUNE_NODE_THRESHOLD, but validateAttestationData still hard-fails when a referenced root is absent. Please either update the PR description to match the implemented approach, or implement the described <= finalized_slot exception so behavior matches the summary.

Copilot uses AI. Check for mistakes.
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 963f6e0 to 827a0ef Compare April 21, 2026 03:27
Copilot AI review requested due to automatic review settings April 21, 2026 03:36
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 827a0ef to c2b3e06 Compare April 21, 2026 03:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
Comment on lines +1300 to +1318
// 5 Rebase forkchoice — lazy prune with node-count threshold.
//
// Eager rebase drops pre-finalized ancestors from proto-array and
// remaps attestation-tracker indices. In-flight attestations whose
// source/target/head still points at one of those ancestors then
// fail the existence checks in validateAttestationData with
// Unknown{Source,Target,Head}Block, and the node burns bandwidth
// re-fetching blocks that will never come back. The grace window
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
}
// else: threshold not reached; keep pre-finalized ancestors
// in proto-array so in-flight attestations still resolve.
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says pruned source blocks should be accepted when their slot <= finalized_slot even if absent from proto-array, but the change here instead delays forkchoice.rebase via a node-count threshold. If the intended behavior is to accept attestations referencing already-finalized (and potentially pruned) source/target/head, that logic is not implemented in validateAttestationData. Please either adjust the PR description to match the threshold-based approach, or implement the finalized-slot tolerance in validation.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +2392 to +2420
@@ -2390,35 +2411,13 @@ test "attestation validation - gossip vs block future slot handling" {
.signature = ZERO_SIGBYTES,
};

// Gossip attestations: should FAIL for next slot (current + 1)
// Gossip attestations: should FAIL for slot current + 2
// Per spec store.py:177: assert attestation.slot <= time_slots
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, false));

// Block attestations: should PASS for next slot (current + 1)
// Block attestations: should FAIL for slot current + 2
// Per spec store.py:140: assert attestation.slot <= Slot(current_slot + Slot(1))
try beam_chain.validateAttestationData(next_slot_attestation.message, true);
const too_far_attestation: types.SignedAttestation = .{
.validator_id = 0,
.message = .{
.slot = 3, // Too far in future
.head = types.Checkpoint{
.root = mock_chain.blockRoots[1],
.slot = 1,
},
.source = types.Checkpoint{
.root = mock_chain.blockRoots[0],
.slot = 0,
},
.target = types.Checkpoint{
.root = mock_chain.blockRoots[1],
.slot = 1,
},
},
.signature = ZERO_SIGBYTES,
};
// Both should fail for slot 3 when current is slot 1
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, false));
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(too_far_attestation.message, true));
try std.testing.expectError(error.AttestationTooFarInFuture, beam_chain.validateAttestationData(next_slot_attestation.message, true));
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name/comments still describe “gossip: <= current_slot; block: <= current_slot + 1”, but the code now allows +1 for both. Also, this test only checks that current+2 fails; it no longer asserts that current+1 succeeds for gossip (the behavior this PR changes). Update the assertions and comments so the test actually covers the new tolerance rule.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
Copy link
Copy Markdown
Member

@g11tech g11tech Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would rebase drop pre finalized, we only rebase with latest finalized?

ahh ok got the context from the comment, never-mind

Comment thread pkgs/node/src/chain.zig Outdated
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link
Copy Markdown
Member

@g11tech g11tech Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is possible, block is processed only if block slot <= current slot and the attestation's in it can't be of a slot future than the block slot

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateBlock already allows block.slot <= current_slot + MAX_FUTURE_SLOT_TOLERANCE (+1), so even for block-embedded attestations the effective upper bound on att.slot is current_slot + 1 — the +1 tolerance here is needed on the block path too.

As an aggregator, this call site handles is_from_block=false — the gossip path, which is independent of any block arrival. The race is clock skew, not block processing:

  • A remote validator signs a gossip attestation at their local slot N (interval 1) with data.slot = N.
  • It propagates to us. If our clock trails the validator's (normal in any distributed system), we can still be at slot N-1 when the message lands.
  • validateAttestationData(..., false) then sees att.slot=N > current_slot=N-1. Without +1 tolerance, we'd drop a perfectly honest vote as "too far in the future".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also removed the previous pending block processing

leanSpec store.py:320 allows current_slot + 1 for all attestations
(gossip and block) as clock disparity tolerance. zeam previously
only allowed +1 for block attestations and rejected gossip attestations
for future slots. Align with leanSpec's unified behavior.
When finalization advances, old blocks are pruned from the protoarray.
Aggregations referencing a source checkpoint that was valid when the
attestation was created can fail validation if finalization pruned the
source block between attestation creation and aggregation publishing.

Accept source blocks whose slot <= finalized_slot as valid even when
absent from the protoarray — they were canonical and are now below
the finalization horizon.

Root cause: race between finalization pruning (interval 0 of new slot)
and aggregation publishing (interval 2 of same slot), where the
aggregation's attestation data references a justified checkpoint that
got pruned during the finalization advance.
…ation

Per leanSpec store.py on_gossip_aggregated_attestation: aggregated
attestations only verify the aggregated signature proof, they do NOT
call validate_attestation on the attestation data. nlean follows the
same pattern.

The attestation data was already validated when individual gossip
attestations arrived. Re-validating in the aggregation path caused
UnknownSourceBlock errors when finalization pruned the source block
from protoarray between attestation creation and aggregation publishing.

This also reverts the "tolerate pruned source blocks" workaround in
validateAttestationData since the root cause is now fixed.
Root-cause the attestation-race previously worked around in
chain.onGossipAggregatedAttestation and validateAttestationData.

Symptom: every time finalization advanced,
processFinalizationAdvancement called forkChoice.rebase(...)
unconditionally. Rebase drops pre-finalized ancestors from
proto-array and remaps attestation-tracker indices. Any in-flight
attestation whose source / target / head referenced one of those
dropped blocks then failed validateAttestationData's existence
check with Unknown{Source,Target,Head}Block — even though the
vote was perfectly valid when the validator produced it.

3SF-mini's fast finalization cadence (one advance every few slots)
puts the race window on the same order of magnitude as normal
gossip propagation, so attestations drop across every finalization
boundary under normal load. Sync catch-up, which can advance
finalization multiple steps in a single tick, makes it much worse.

Fix — lighthouse-style lazy prune:

  * Introduce constants.PRUNE_NODE_THRESHOLD = 64. Lighthouse uses
    256 for mainnet; 3SF-mini's shorter wall-clock per finalization
    step makes 64 (~1 eth epoch worth of grace) the better trade
    between race coverage and memory footprint.
  * Gate the rebase call in processFinalizationAdvancement on the
    finalized node's index in the proto-array. While that index is
    below the threshold, skip rebase and leave the pre-finalized
    prefix addressable so in-flight attestations still resolve.
  * Expose ForkChoice.getProtoNodeIndex for the gate (mutex-safe
    wrapper over protoArray.indices).

Revert of the prior tolerance workaround:

  * onGossipAggregatedAttestation now re-validates the attestation
    data again. The "skip validate" workaround exists only because
    source blocks could vanish mid-finalization; with the threshold
    gate keeping them in place, the stricter check is safe again
    and catches malformed aggregates that used to slip through.

Test:

  * Add regression test driving the mock chain with default
    onBlock opts (pruneForkchoice = true) through 5 slots. The
    finalized node's index stays well under the threshold, so
    pre-finalized ancestors must remain resolvable via
    getProtoNode after finalization advances. Before this fix,
    the rebase call would have removed roots[1] and roots[2]
    (slots < latestFinalized.slot) from the protoArray and the
    assertion would trip.
  * Update the existing "Test 9: Attestation too far in future"
    case to use slot=current+2 — the spec-aligned +1 tolerance
    change from 09712ac made the prior slot=current+1 trip point
    pass validation; the test was not updated at the time.

The future-slot +1 tolerance fix (09712ac) is orthogonal and
preserved — spec alignment with store.py:320.
zeam was the only client among leanSpec / ream / qlean-mini /
ethlambda / nlean that queued inbound blocks on the local clock.
Every other implementation processes a block the moment its parent
state is known — their only "pending" caches are for orphan blocks
(parent unknown, awaiting backfill), never for blocks whose slot just
hasn't started yet locally.

The queue also kept block body attestations out of fork choice until
the next interval tick, delaying head updates by a full interval and
duplicating the future-slot rejection that already lives in
chain.validateBlock.

Changes:

  * Remove BeamChain.pending_blocks field, its init / deinit hooks,
    and the append site on the gossip path.
  * Remove BeamChain.processPendingBlocks (replay helper) and its
    caller in node.onInterval.
  * Drop the duplicate future-slot gate in
    forkchoice.onBlockUnlocked. Gossip-level DoS filtering continues
    to happen in chain.validateBlock (MAX_FUTURE_SLOT_TOLERANCE = 1,
    matching leanSpec store.py:320); forkchoice now mirrors
    leanSpec store.on_block and only requires a known parent plus
    non-pre-finalized slot.
  * Remove ForkChoiceError.FutureSlot — no caller returns it.
  * Update the fork-choice block-tree unit test: the old test drove
    "reject future slot" against forkchoice.onBlock; that
    responsibility has moved up to chain.validateBlock, so the test
    now just ticks the clock and processes the block.

validateBlock still rejects block.slot > current_slot + 1 with
BlockValidationError.FutureSlot; node.zig's handler for
error.FutureSlot still caches those blocks for later retry, so the
"block arrived genuinely too early" case is covered exactly as
before — just without the redundant inner-layer queue.
Copilot AI review requested due to automatic review settings April 22, 2026 09:09
@GrapeBaBa GrapeBaBa force-pushed the fix/align-attestation-slot-tolerance-with-leanspec branch from 896043e to 82115ac Compare April 22, 2026 09:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/node.zig
Comment on lines 1142 to 1144
// Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
self.sweepTimedOutRequests();

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the forkchoice FutureSlot gate (and the pending-block replay), BeamChain.onBlock no longer returns error.FutureSlot (it only returns BlockValidationError.FutureSlot from validateBlock, which is used in the gossip path). However, processCachedDescendants in this file still checks err == error.FutureSlot when calling self.chain.onBlock(...), which will now be unreachable and likely fail to compile because FutureSlot is no longer in the inferred error set. Please remove that branch or perform an explicit chain.validateBlock check before calling onBlock for cached blocks and handle BlockValidationError.FutureSlot there.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
Comment on lines +1242 to +1245
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new prune gate skips rebase when getProtoNodeIndex(latestFinalized.root) returns null. Previously this would have surfaced as ForkChoiceError.InvalidTargetAnchor; now it silently does nothing, which can mask a forkchoice inconsistency and also prevent pruning from ever running (proto-array growth) if that situation occurs. Consider treating a null index as an error (or at least logging loudly) rather than silently skipping rebase.

Copilot uses AI. Check for mistakes.
@GrapeBaBa GrapeBaBa requested a review from g11tech April 22, 2026 09:23
Copilot AI review requested due to automatic review settings April 23, 2026 09:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

leanSpec `store.py:validate_attestation` enforces two topology / consistency
rules that zeam had been skipping:

- `data.head.slot >= data.target.slot` — head checkpoint cannot be older
  than the target (topology check, store.py:304).
- `head_block.slot == data.head.slot` — head checkpoint slot must match
  the actual block slot the checkpoint points at (consistency check,
  store.py:314).

Without these, malformed gossip / aggregated attestations with a stale or
mismatched head pointer slipped through zeam's validation while leanSpec
and the rest of the ecosystem reject them. Add the two checks with
matching `HeadOlderThanTarget` / `HeadCheckpointSlotMismatch` error
variants and remove the stale `_ = head_block; // Will be used in future`
placeholder now that the node is actually consulted.
@GrapeBaBa GrapeBaBa changed the title fix: align attestation future slot tolerance with leanSpec fix: align attestation handling with leanSpec Apr 23, 2026
The block explained why re-validation was restored — context that is
meaningful in a PR description but rots as a source comment: future
readers have no "earlier revision" to compare against, and the rebase
gating it references is documented at the rebase site itself.
Copilot AI review requested due to automatic review settings April 23, 2026 13:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/node.zig:1197

  • After removing processPendingBlocks, cached “future-slot” blocks are only retried via processReadyCachedBlocks(slot). That helper currently considers blocks “ready” only when block.slot <= current_slot (see processReadyCachedBlocks in this file), which delays retry by ~1 slot relative to MAX_FUTURE_SLOT_TOLERANCE (since a block becomes valid once block.slot <= current_slot + 1). Consider adjusting the readiness condition (or the slot passed in) so cached future blocks are retried as soon as they fall within the +1 tolerance window.
                // Sweep timed-out RPC requests to prevent sync stalls from non-responsive peers.
                self.sweepTimedOutRequests();

                self.processReadyCachedBlocks(slot);
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkgs/node/src/chain.zig
Comment on lines +2669 to +2673
const signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test doesn’t currently exercise processFinalizationAdvancement/rebase behavior: it calls beam_chain.onBlock(...) but never calls beam_chain.onBlockFollowup(...) (which is where finalization advancement triggers the gated rebase). As written, it would also pass on the pre-fix code because no rebase occurs. Consider invoking onBlockFollowup(true, &signed_block) after each imported block (or driving the chain through the same entrypoints node uses) so the test actually fails without the threshold gate.

Suggested change
const signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
var signed_block = mock_chain.blocks[i];
const current_slot = signed_block.block.slot;
try beam_chain.forkChoice.onInterval(current_slot * constants.INTERVALS_PER_SLOT, false);
const missing_roots = try beam_chain.onBlock(signed_block, .{});
allocator.free(missing_roots);
try beam_chain.onBlockFollowup(true, &signed_block);

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/forkchoice.zig Outdated
Comment on lines +1628 to +1638
// The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)
// used to live here but duplicated the gossip-level check in
// chain.validateBlock and disagreed with every other leanSpec
// consumer (ream, qlean-mini, ethlambda, nlean — none of which
// key block admission on the local clock). Rejecting here also
// caused chain.zig to keep a parallel pending-block queue just
// to replay these blocks on the next interval, which meant the
// block / its body attestations couldn't contribute to fork
// choice weight until the clock caught up. Now we mirror
// leanSpec store.on_block: if parent is known and the slot is
// not pre-finalized, just process it.
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the forkchoice-level future-slot gate means some block-processing paths can now import blocks that are far beyond MAX_FUTURE_SLOT_TOLERANCE. In particular, the RPC/sync path calls BeamChain.onBlock(...) directly (e.g. pkgs/node/src/node.zig:677) without running chain.validateBlock, so ForkChoice.onBlock will accept and fully process arbitrarily-future blocks as long as the parent is known. Consider enforcing the same future-slot bound for all imports (e.g. call validateBlock inside BeamChain.onBlock, or validate in the RPC handler) so malicious peers can’t drive expensive STF/forkchoice work with far-future blocks.

Suggested change
// The future-slot gate (slot * INTERVALS_PER_SLOT > fcStore.time)
// used to live here but duplicated the gossip-level check in
// chain.validateBlock and disagreed with every other leanSpec
// consumer (ream, qlean-mini, ethlambda, nlean — none of which
// key block admission on the local clock). Rejecting here also
// caused chain.zig to keep a parallel pending-block queue just
// to replay these blocks on the next interval, which meant the
// block / its body attestations couldn't contribute to fork
// choice weight until the clock caught up. Now we mirror
// leanSpec store.on_block: if parent is known and the slot is
// not pre-finalized, just process it.
// Do not reintroduce the old strict local-clock forkchoice gate
// here; it caused direct-import paths to maintain separate
// pending queues and delayed useful forkchoice participation for
// slightly-future blocks. However, we still need the same bounded
// future-slot protection that validateBlock applies so all import
// paths reject arbitrarily far-future blocks before doing
// expensive STF / forkchoice work.
if (slot > opts.currentSlot + constants.MAX_FUTURE_SLOT_TOLERANCE) {
return ForkChoiceError.FutureSlot;
}

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1459 to +1463
//
// Gossip attestations must be for current or past slots only. Validators attest
// in interval 1 of the current slot, so they cannot attest for future slots.
// Block attestations can be more lenient since the block itself was validated.
// Per leanSpec: allow current_slot + 1 for clock disparity tolerance,
// regardless of whether the attestation is from gossip or a block.
const current_slot = self.forkChoice.getCurrentSlot();
const max_allowed_slot = if (is_from_block)
current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE // Block attestations: allow +1
else
current_slot; // Gossip attestations: no future slots allowed
const max_allowed_slot = current_slot + constants.MAX_FUTURE_SLOT_TOLERANCE;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validateAttestationData now allows current_slot + MAX_FUTURE_SLOT_TOLERANCE for both gossip and block attestations, but the function header comment still states that gossip attestations have no future tolerance. Update the doc comment above validateAttestationData to match the new behavior/spec alignment.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig Outdated
Comment on lines +1239 to +1245
// must outlast the worst-case gossip delay plus at least one
// finalization tick; constants.PRUNE_NODE_THRESHOLD = 64 slots
// (≈256 s at SECONDS_PER_SLOT=4) is comfortably beyond both.
if (pruneForkchoice) {
if (self.forkChoice.getProtoNodeIndex(latestFinalized.root)) |finalized_idx| {
if (finalized_idx >= constants.PRUNE_NODE_THRESHOLD) {
try self.forkChoice.rebase(latestFinalized.root, &canonical_view);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says PRUNE_NODE_THRESHOLD = 64 slots, but the gate uses finalized_idx (proto-array node index), which is not necessarily equal to slot distance (multiple blocks/forks can inflate indices). Either adjust the wording here or gate on slot distance if the intent is a time-based grace window.

Copilot uses AI. Check for mistakes.
Five follow-ups from the review pass:

- Close RPC / sync future-slot gap: BeamChain.onBlock now enforces the
  MAX_FUTURE_SLOT_TOLERANCE (+1) bound itself. Previously only the gossip
  path funnelled through chain.validateBlock; RPC handlers went straight
  to onBlock and would happily run STF + fork-choice on blocks arbitrarily
  far in the future, a cheap DoS vector.

- Make the PRUNE_NODE_THRESHOLD regression test actually exercise the
  gate by calling onBlockFollowup (where processFinalizationAdvancement
  — and the rebase call it gates — fires). Previously the test only
  called onBlock and would have passed even without the fix.

- Update validateAttestationData doc and `attestation validation -
  gossip vs block future slot handling` test to describe the unified
  `current_slot + 1` rule both paths now follow; the test is renamed and
  now asserts both the accepted current+1 case and the rejected current+2
  case on both is_from_block values.

- Clarify PRUNE_NODE_THRESHOLD: the threshold is a proto-array node
  count, not a slot distance. Under heavy forking, node count advances
  faster than slot distance, so the wall-clock grace can shrink. Updated
  the constants.zig comment and the rebase call site accordingly.

- Log loudly instead of silently skipping the rebase when
  latestFinalized.root is missing from proto-array. Under normal
  operation it is always present (getCanonicalViewAndAnalysis resolves
  it via protoArray.indices); surfacing the invariant violation beats
  hiding it behind "nothing happened".
…ec rule

Switch the gossip-attestation time check from slot-grained to
interval-grained:

    data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS

A whole-slot tolerance let an adversary pre-publish next-slot aggregates
ahead of any honest validator (~800 ms head start at SECONDS_PER_SLOT=4 /
INTERVALS_PER_SLOT=5). One interval bounds that head start to NTP drift.

Block-included attestations skip the time check; they are trusted under
the block's signature + STF validation. The block path itself drops the
slot-grained future-slot bound entirely — block admission is gated by
parent / signature / STF only.

Removed:
- constants.MAX_FUTURE_SLOT_TOLERANCE (was reused across attestation
  validation and validateBlock / onBlock; the block usages had no spec
  basis to begin with)
- chain.validateBlock future-slot rejection
- chain.onBlock entry-point future-slot rejection
- chain.validateAttestationData block-path slot bound
- BlockValidationError.FutureSlot variant
- node.zig error.FutureSlot handlers and cacheFutureBlock helper
  (the cacheFutureBlock path had no replay mechanism after the
  time-based block queue was removed earlier in this PR)

Added:
- constants.GOSSIP_DISPARITY_INTERVALS = 1
- chain.validateAttestationData gossip-only interval bound
- chain.zig boundary tests covering the gossip / block divergence
Copilot AI review requested due to automatic review settings April 29, 2026 03:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

pkgs/node/src/chain.zig:1324

  • validateBlock no longer enforces any upper bound on block.slot relative to the local clock. Because onGossip runs signature verification + STF before adding to forkchoice, a far-future (but validly signed) block can force process_slots to iterate up to that slot, causing excessive work. Add back a future-slot limit (the intended +1 tolerance) here so blocks that are too far ahead are rejected early and cheaply.
    pub fn validateBlock(self: *Self, block: types.BeamBlock, is_from_gossip: bool) !void {
        _ = is_from_gossip;

        const finalized_slot = self.forkChoice.fcStore.latest_finalized.slot;

        // 1. Pre-finalized slot check - reject blocks before finalized slot
        if (block.slot < finalized_slot) {
            self.logger.debug("block validation failed: pre-finalized slot {d} < finalized {d}", .{
                block.slot,
                finalized_slot,
            });

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1628 to 1633
// Block admission only requires a known parent and a slot above
// the finalized boundary; STF and signature verification are the
// gating layers.
if (slot < self.fcStore.latest_finalized.slot) {
return ForkChoiceError.PreFinalizedSlot;
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FutureSlot gate was removed from ForkChoice.onBlockUnlocked, so a validly-signed block with a far-future slot can now be admitted and then drive an unbounded BeamState.process_slots loop during STF (see pkgs/types/src/state.zig:256+), leading to potential CPU/latency DoS. Reintroduce an upper-bound future-slot check (with the intended +1 tolerance) before expensive work, and return a dedicated error (or reuse an existing one) so callers can cache/ignore blocks that are too far ahead.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +21
// Future-slot tolerance for gossip attestations, measured in intervals:
//
// data.slot * INTERVALS_PER_SLOT <= store.time + GOSSIP_DISPARITY_INTERVALS
//
// where store.time is in intervals. One interval is roughly 800 ms at
// SECONDS_PER_SLOT=4 / INTERVALS_PER_SLOT=5.
//
// A whole-slot tolerance would let an adversary pre-publish next-slot
// aggregates ahead of any honest validator (~800 ms head start at 4 s
// slots); tightening to one interval bounds that head start to NTP drift.
//
// Block-included attestations skip this check entirely; they are trusted
// under the block's own validation.
pub const GOSSIP_DISPARITY_INTERVALS = 1;
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description (section 1) states leanSpec allows data.slot <= current_slot + 1 for all attestations as clock-disparity tolerance. The implementation here instead uses an interval-grained gossip-only bound (GOSSIP_DISPARITY_INTERVALS = 1) and explicitly skips the time check for block-included attestations. Either update the PR description to match this behavior, or adjust the constants/validation logic to match the stated leanSpec rule so reviewers/operators aren't relying on incorrect semantics.

Copilot uses AI. Check for mistakes.
Comment thread pkgs/node/src/chain.zig
try beam_chain.forkChoice.onInterval(boundary_time, false);

// At the boundary the gossip path admits the same vote.
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new boundary-acceptance behavior is only unit-tested via direct calls to validateAttestationData. Add an integration-style assertion that a boundary-admitted gossip attestation can be processed through the normal path (e.g. onGossipAttestation / forkChoice.onSignedAttestation) without being rejected downstream, to prevent regressions where validation and forkchoice admission rules diverge.

Suggested change
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
try beam_chain.validateAttestationData(next_slot_attestation.message, false);
// Ensure the normal downstream path also accepts the boundary-admitted gossip attestation.
try beam_chain.forkChoice.onSignedAttestation(next_slot_attestation);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants