fix: carry PrepareQC lock across consecutive timeouts in voteTimeout#3029
fix: carry PrepareQC lock across consecutive timeouts in voteTimeout#3029wen-coding merged 6 commits intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3029 +/- ##
==========================================
- Coverage 58.32% 58.27% -0.06%
==========================================
Files 2079 2077 -2
Lines 171702 171284 -418
==========================================
- Hits 100144 99812 -332
+ Misses 62631 62579 -52
+ Partials 8927 8893 -34
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| return types.NewTimeoutQC(votes) | ||
| } | ||
|
|
||
| // simulatePushTimeoutQC mirrors the production pushTimeoutQC: keeps CommitQC |
There was a problem hiding this comment.
can you simulate it on State rather than inner, so that we don't have to "mirror" anything?
| // startRunOutputs launches runOutputs in the background and returns | ||
| // a context that is cancelled when the test finishes. | ||
| func startRunOutputs(t *testing.T, s *State) context.Context { | ||
| ctx, cancel := context.WithCancel(t.Context()) |
There was a problem hiding this comment.
this doesn't make sense. t.Context() is always canceled when the test finishes.
Also, why not s.Run()? The fact that s.runOutputs is suffcient rn is an implementation detail that the test should not need to rely on.
Also the s.Run() output should be checked for (non-cancellation) error, so I'd recommend scope.Run(func(){ s.SpawnBg(s.Run()) })
voteTimeout only attached i.PrepareQC (current view) to TimeoutVotes. Since pushTimeoutQC clears i.PrepareQC on view change, consecutive timeouts (e.g. offline leader) would lose the PrepareQC lock, producing a TimeoutQC with no latestPrepareQC. This allowed a later leader to propose a conflicting block instead of being forced to repropose, breaking the safety invariant (two CommitQCs at the same RoadIndex). Fix: voteTimeout now uses the higher of i.PrepareQC and the PrepareQC inherited from i.TimeoutQC.LatestPrepareQC(). This ensures the lock propagates through the TimeoutQC chain indefinitely until a CommitQC advances the RoadIndex. Made-with: Cursor
Made-with: Cursor
i.PrepareQC is always from the current view (strictly higher than any inherited PrepareQC), so a simple presence check is equivalent to the view comparison. Suggested in review. Made-with: Cursor
Tests now call State.pushTimeoutQC / State.pushPrepareQC / State.voteTimeout directly instead of mirroring production logic in helper functions. Made-with: Cursor
Replace the ad-hoc startRunOutputs helper (which redundantly wrapped t.Context() and exposed the runOutputs implementation detail) with the established scope.Run + SpawnBg + IgnoreCancel pattern. Assertions inside the closure use error returns instead of require (goroutine safety). Made-with: Cursor
b97fff3 to
06c58ae
Compare
masih
left a comment
There was a problem hiding this comment.
Thank you for getting this sorted so swiftly 🙌
| return nil | ||
| }) | ||
| require.NoError(t, err) | ||
| } |
There was a problem hiding this comment.
Not a blocker: It would be great to also add a test cases for
- the exact quorum-intersection behavior in
NewTimeoutQCplusTimeoutQC.Verify. - the restart regression test with persisted
TimeoutQCjust to verifyvoteTimeoutis carried over to prepare.
- types: test NewTimeoutQC with mixed PrepareQCs (quorum-intersection), all-None, and highest-selected scenarios, each verified with Verify. - consensus: test that voteTimeout inherits PrepareQC from a persisted TimeoutQC after restart (two-session scope.Run). Made-with: Cursor
Summary
voteTimeoutonly attachedi.PrepareQC(current view) to TimeoutVotes. SincepushTimeoutQCclearsi.PrepareQCon view change, consecutive timeouts (e.g. offline leader) would lose the PrepareQC lock. This produced a TimeoutQC with nolatestPrepareQC, allowing a later leader to propose a conflicting block instead of being forced to repropose — breaking the safety invariant (two CommitQCs possible at the same RoadIndex).!pqc.IsPresent()),voteTimeoutnow inherits the one fromi.TimeoutQC.LatestPrepareQC(). The lock propagates through the TimeoutQC chain indefinitely until a CommitQC advances the RoadIndex.TimeoutQC.LatestPrepareQC()) that was already persisted; no schema or protocol changes needed.Attack scenario (before fix)
pushTimeoutQCclearsi.PrepareQC. Leader is offline. Timer fires → TimeoutVotes carry no PrepareQC. TimeoutQC(M,1) haslatestPrepareQC = None.Why fix #1 alone is sufficient (no lock check in pushProposal needed)
With the fix, every honest validator's TimeoutVote carries the inherited PrepareQC. Any quorum of 2f+1 timeout votes includes ≥ f+1 honest validators carrying it, so
NewTimeoutQCalways selects it.FullProposal.Verifyalready enforces that ifTimeoutQC.reproposal()returns a proposal, the leader must repropose it — so conflicting proposals are rejected by all honest validators.Test plan
Tests exercise the real
State.pushTimeoutQC,State.pushPrepareQC, andState.voteTimeoutmethods viascope.Run+s.Run(ctx)(no mirrored production logic):TestVoteTimeoutPrepareQC_BothNone— both None → NoneTestVoteTimeoutPrepareQC_OnlyCurrentView— current view PrepareQC, no TimeoutQC → uses currentTestVoteTimeoutPrepareQC_InheritedFromTimeoutQC— core safety test: None + inherited → uses inherited, chains through 3 consecutive timeoutsTestVoteTimeoutPrepareQC_CurrentViewHigherThanInherited— reproposal succeeds (both present, current higher) → uses currentTestVoteTimeoutPrepareQC_CurrentViewPresentInheritedNone— current present, TimeoutQC has no PrepareQC → uses currentTestVoteTimeoutPrepareQC_PersistedRestart— restart regression: persisted TimeoutQC survives restart, voteTimeout inherits PrepareQC from itTestNewTimeoutQC_MixedPrepareQCs— quorum-intersection: single vote carries PrepareQC among None votes → selected and Verify passesTestNewTimeoutQC_AllNone— all votes carry None → LatestPrepareQC is None, Verify passesTestTimeoutQCVerify_HighestPrepareQCSelected— votes carry PrepareQCs at different views → highest selected, Verify passesInheritedFromTimeoutQCcatches the bug)