feat(pipeline): allow syncing blocks ontop of the proposed chain#21025
feat(pipeline): allow syncing blocks ontop of the proposed chain#21025Maddiaa0 wants to merge 19 commits intomerge-train/spartanfrom
Conversation
8de6157 to
31f941d
Compare
3a75fdb to
1ca98d8
Compare
| // Atomically set the pending checkpoint number alongside the block if provided | ||
| pendingCheckpointNumber !== undefined && | ||
| this.store.blockStore.setPendingCheckpointNumber(pendingCheckpointNumber), |
There was a problem hiding this comment.
This doesn't match the comment that this Sets the pending checkpoint number (quorum-attested but not yet L1-confirmed), right?
That said, let's discuss the model. Weirdly, I like the concept of having an uncheckpointed checkpoint. But it seems like we have two different things:
- A checkpoint-being-built, which is the checkpoint being built via proposed blocks. This is just a checkpoint number, since a
Checkpointobject needs all its blocks to be ready. Today we already have this, but don't need to explicitly store it. - A checkpoint-being-proposed, which is the
Checkpointobject for which the current proposer has sent a checkpoint proposal, and would ultimately make it onto L1.
We need to define which ones we expose to clients of the archiver, and also to users via APIs like getL2Tips.
6a16e98 to
28a0520
Compare
27f8a4b to
f5c6308
Compare
b1b1b14 to
ce0e422
Compare
f5c6308 to
ae6b651
Compare
ce0e422 to
d8b8d31
Compare
3a47fbf to
4bb6845
Compare
d8b8d31 to
6973d24
Compare
4bb6845 to
0e9b52e
Compare
0e9b52e to
dee426a
Compare
6973d24 to
21cdcd5
Compare
21cdcd5 to
9543a61
Compare
2d31671 to
2591475
Compare
| // Trigger syncs to flush any queued blocks, retrying until we find the data or give up. | ||
| if (!blockData) { | ||
| blockData = await retryUntil( | ||
| async () => { |
There was a problem hiding this comment.
I think we probably want to
- Not call
syncImmediatehere. That's going to force the archiver to query L1 aggressively. When the block is pushed to the archiver it already calls that function, se we aren't going to make things go any faster. - Use a more appropriate timeout. Presumably that wuld be the end of the slot?
| /** Storage format for a pending checkpoint (attested but not yet L1-confirmed). */ | ||
| type PendingCheckpointStore = { | ||
| header: Buffer; | ||
| checkpointNumber: number; | ||
| startBlock: number; | ||
| blockCount: number; | ||
| totalManaUsed: string; | ||
| feeAssetPriceModifier: string; | ||
| }; |
There was a problem hiding this comment.
Why can't we capture archive, outhash, or all data that's not L1 or attestations?
Also, nit: rename to PendingCheckpointStorage for consistency with the other types here.
There was a problem hiding this comment.
it can, I just kept the minimum; will add
| // The same check as above but for checkpoints. Accept the block if either the confirmed | ||
| // checkpoint or the pending (locally validated but not yet confirmed) checkpoint matches. | ||
| const expectedCheckpointNumber = blockCheckpointNumber - 1; | ||
| if ( | ||
| !opts.force && | ||
| previousCheckpointNumber !== expectedCheckpointNumber && | ||
| pendingCheckpointNumber !== expectedCheckpointNumber | ||
| ) { | ||
| const [reported, source]: [CheckpointNumber, 'confirmed' | 'pending'] = | ||
| pendingCheckpointNumber > previousCheckpointNumber | ||
| ? [pendingCheckpointNumber, 'pending'] | ||
| : [previousCheckpointNumber, 'confirmed']; | ||
| throw new CheckpointNumberNotSequentialError(blockCheckpointNumber, reported, source); |
There was a problem hiding this comment.
Is there any situation where addProposedBlock would add a block for the pending checkpoint? My understanding is we add proposed blocks, then throw a checkpoint proposal on top to flag those as "pending checkpointing", and then keep adding proposed blocks for the next one.
There was a problem hiding this comment.
Also, adding a block to a pending checkpoint breaks the blockCount property of the PendingCheckpointStore. Seems like we should not allow that.
There was a problem hiding this comment.
No, we should not end up adding directly to the pending checkpoint, only above it.
This case is to allow building ontop of the pending checkpoint - not for. But it looks like it may allow what you have mentioned, I'll make it more strict
| block: { number: provenBlockNumber, hash: provenBlockData.blockHash.toString() }, | ||
| checkpoint: provenCheckpointId, | ||
| }, | ||
| pendingCheckpoint: pendingCheckpointBlockData |
There was a problem hiding this comment.
We definitely need to rethink naming. The rollup contract already has the concept of "pending checkpoint" which is actually the checkpointed checkpoint. We need to change either.
Other thoughts, in random order, some mutually exclusive:
- Do we want to bundle the pending checkpoint with the
proposedchain tip? It'll break the property that, in a chain tip, the reported block matches the last block of the reported checkpoint. But maybe it's fine. - Do we want to differentiate attested vs non-attested pending checkpoints? Personally I don't think so.
- Should the
pendingCheckpointtip return thecheckpointedone if there's no pending checkpoint block data? It's the only chain tip that may be undefined. - Do we want to make a bigger rename? It seems like we're dealing with "pending checkpoints" and "checkpointed checkpoints". Maybe "checkpoint" was the bad name, and we should be talking about "pending bundles" and "checkpointed bundles" or something like that?
No need to make these naming changes in this PR, but let's give it a good thought before closing this epic.
| } | ||
|
|
||
| // What's the slot of the first uncheckpointed block? | ||
| // Don't prune blocks that are covered by a pending checkpoint (awaiting L1 submission from pipelining) |
| const current = await this.getPendingCheckpointNumber(); | ||
| if (pending.checkpointNumber <= current) { | ||
| this.#log.warn(`Ignoring stale pending checkpoint number ${pending.checkpointNumber} (current: ${current})`); | ||
| return; | ||
| } | ||
| const confirmed = await this.getLatestCheckpointNumber(); | ||
| if (pending.checkpointNumber !== confirmed + 1) { | ||
| this.#log.warn( | ||
| `Ignoring pending checkpoint ${pending.checkpointNumber}: expected ${confirmed + 1} (confirmed + 1)`, | ||
| ); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Unless we can think of legitimate situations for this, I'd throw instead of warning. It will help us catch inconsistencies easier.
| if (this.interrupted) { | ||
| return undefined; | ||
| } | ||
| return this.sendRequests(); |
There was a problem hiding this comment.
Should we re-simulate before sending, this time with the actual state of L1, instead of the manual overrides? This should help catch scenarios where the previous checkpoint didn't behave as we expected (not to mention the ones where it didn't land).
There was a problem hiding this comment.
#21250 in here i add a preCheck hook to each submission that runs after the enqueue sleep - without the overrides
| const grandparentCheckpointNumber = CheckpointNumber(this.checkpointNumber - 2); | ||
| const [grandparentCheckpoint, manaTarget] = await Promise.all([ | ||
| rollup.getCheckpoint(grandparentCheckpointNumber), | ||
| rollup.getManaTarget(), | ||
| ]); |
There was a problem hiding this comment.
I guess this works because we're guaranteed to have a grandparent checkpoint if there's a non-undefined pendingCheckpointData? Otherwise I see this failing in the first checkpoint(s)?
| if (invalidateCheckpoint) { | ||
| // After invalidation, L1 will roll back to checkpoint N-1. The archive at N-1 already | ||
| // exists on L1, so we just pass the matching archive (the lastArchive of the invalid checkpoint). | ||
| archiveForCheck = invalidateCheckpoint.lastArchive; | ||
| l1Overrides.forcePendingCheckpointNumber = invalidateCheckpoint.forcePendingCheckpointNumber; | ||
| this.metrics.recordPipelineDepth(0); | ||
| } else if (this.epochCache.isProposerPipeliningEnabled() && syncedTo.hasPendingCheckpoint) { | ||
| // Parent checkpoint hasn't landed on L1 yet. Override both the pending checkpoint number | ||
| // and the archive at that checkpoint so L1 simulation sees the correct chain tip. | ||
| const parentCheckpointNumber = CheckpointNumber(checkpointNumber - 1); | ||
| l1Overrides.forcePendingCheckpointNumber = parentCheckpointNumber; | ||
| l1Overrides.forceArchive = { checkpointNumber: parentCheckpointNumber, archive: syncedTo.archive }; | ||
| this.metrics.recordPipelineDepth(1); | ||
|
|
||
| this.log.verbose( | ||
| `Building on top of pending checkpoint (pending=${syncedTo.pendingCheckpointData?.checkpointNumber})`, | ||
| ); |
There was a problem hiding this comment.
Shouldn't the order of this ifs be switched? If there's a pending checkpoint AND the tip of the checkpointed chain is invalid, we should expect the pending checkpoint to perform the invalidation when it lands, so we should build on top of the pending checkpoint instead.
There was a problem hiding this comment.
Ive dug into this and yes you're right, the parent should have bundled it with their checkpoint
7802979 to
b371deb
Compare
| public getProposedCheckpoint(): Promise<ProposedCheckpointData | undefined> { | ||
| return this.store.blockStore.getProposedCheckpoint(); | ||
| } |
There was a problem hiding this comment.
I know I should've asked this earlier: is it possible that there's more than one "proposed checkpoint" in flight at any given time? Eg let's say that we're close to the end of slot N. The L1 tx for posting checkpoint N hasn't landed yet, but slot N+1 is already built and its checkpoint proposal has already been broadcasted via p2p. Wouldn't that lead to two checkpoints awaiting checkpointing?
There was a problem hiding this comment.
I have an issue for this - i wanted to cement the happy path first
There was a problem hiding this comment.
To prevent this happening in this PR, theres a Max Pipelining depth. The proposer should not build if their checkpoint is 2 ahead of what is currently on L1.
For byzantine nodes, the proposed checkpoint is not stored if it is too far ahead of the current checkpointed block.
In the future we can relax these constraints. But right now there is a max depth to prevent reorgs getting too big
| // Don't prune blocks that are covered by a proposed checkpoint (awaiting L1 submission from pipelining) | ||
| const firstUncheckpointedBlockNumber = BlockNumber(lastCheckpointedBlockNumber + 1); | ||
| if (proposedCheckpoint) { | ||
| const lastPendingBlock = BlockNumber(proposedCheckpoint.startBlock + proposedCheckpoint.blockCount - 1); | ||
| if (lastPendingBlock >= firstUncheckpointedBlockNumber) { | ||
| this.log.trace(`Skipping prune: proposed checkpoint covers blocks up to ${lastPendingBlock}`); | ||
| return; | ||
| } | ||
| } |
There was a problem hiding this comment.
Shouldn't this check depend on the slot numbers, as opposed to just the block numbers? Is it possible we have moved way past the slots in which the proposed checkpoint should have been mined, but we don't prune it because this check does not look at anything time-dependent?
There was a problem hiding this comment.
its fixed in the pr on top but i can move it here
There was a problem hiding this comment.
Don't, I'm fine with tackling it later
| totalManaUsed: BigInt(stored.totalManaUsed ?? '0'), | ||
| feeAssetPriceModifier: BigInt(stored.feeAssetPriceModifier ?? '0'), |
There was a problem hiding this comment.
When could these be undefined?
There was a problem hiding this comment.
never, this is legacy will clean up
| if ( | ||
| previousBlock.checkpointNumber === block.checkpointNumber && | ||
| previousBlock.indexWithinCheckpoint !== block.indexWithinCheckpoint - 1 | ||
| ) { | ||
| throw new BlockIndexNotSequentialError(block.indexWithinCheckpoint, previousBlock.indexWithinCheckpoint); | ||
| } |
There was a problem hiding this comment.
Missing a check for when we cross from one checkpoint to the next (checkpoint number should be +1, index within checkpoint should be zero)
| // Register checkpoint proposal handler for all nodes. | ||
| // Validates proposals before setting proposed checkpoint on archiver. | ||
| const getValidatorAddresses = validatorClient | ||
| ? () => validatorClient.getValidatorAddresses().map(a => a.toString()) | ||
| : undefined; | ||
| createCheckpointProposalHandler(config, { | ||
| checkpointsBuilder: validatorCheckpointsBuilder, | ||
| blockSource: archiver, | ||
| l1ToL2MessageSource: archiver, | ||
| epochCache, | ||
| dateProvider, | ||
| telemetry, | ||
| }).register(p2pClient, archiver, getValidatorAddresses); |
There was a problem hiding this comment.
Damn, in #21999 I moved checkpoint proposal logic to the block proposal handler, and renamed it to "proposal handler". Sorry for the conflict. We should unify it. I'm fine with either approach.
| // When pipelining, force the proposed checkpoint number and fee header to the parent so that | ||
| // the fee computation matches what L1 will see when the previous pipelined checkpoint has landed. |
There was a problem hiding this comment.
Just to confirm: the LAG for updating fees is enough that we can safely precompute it this much in advance?
bfccd52 to
69756aa
Compare
| p2pClient.registerBlockProposalHandler(blockHandler); | ||
|
|
||
| // Register checkpoint proposal handler if blob uploads are enabled and we are reexecuting | ||
| if (this.blobClient.canUpload() && shouldReexecute) { |
There was a problem hiding this comment.
i register it for everyone, but i can guard this by pipeliningEnabled
| // These tests verify parity with Solidity FeeLib.sol (computeNewEthPerFeeAsset, clampedAdd). | ||
| // If FeeLib.sol changes, these tests must be updated to match. |
There was a problem hiding this comment.
I think we'll have overlap with #22116. You merge first on this one.

Overview
Key contributions:
Adds a second p2p callback that separates what runs for all nodes / validator nodes
Testing
epochs_mbps.pipeline now expects 3 blocks per checkpoint, just like the original epochs_mbps test, now it is fully pipelined.
Upcoming