Raft Cluster: stability fixes by zuiderkwast · Pull Request #4015 · valkey-io/valkey

zuiderkwast · 2026-06-22T11:15:43Z

A few independent small fixes to the Raft cluster implementation (the cluster-v2 feature branch).

I suggest reviewing these commit-by-commit.

Use base node-timeout for leader lease instead of randomized election timeout

The leader lease (used to deny pre-votes while the leader is alive) was using the randomized election timeout (1x–2x node-timeout). A voter with a longer randomized
timeout could deny legitimate pre-votes from candidates that correctly detected a failed leader. Now uses the base node-timeout, ensuring consistent lease duration
across all nodes. Follow-up of Raft Cluster: Implement Pre-Vote protocol to prevent term inflation #3931, @quanyeyang
Fix proposal retry flooding

Pending proposals were unconditionally retried every 100ms in cron, flooding the leader with duplicate proposals. Replaced with targeted retry triggers: on leader change
(detected in AE handler), on reconnect to the same leader (HI handler), and on becoming leader. Reduces unnecessary network traffic and leader load.
Disconnect stale outbound links

A node that was paused or killed could leave a half-open TCP connection that the peer kept sending to indefinitely. Added detection of dead outbound links: links that
never received HI (connection to a frozen peer) and leader links with no AE_ACK response. Disconnecting stale links allows clusterConnectNodes to re-establish working
connections promptly.
Fix size_t overflow in PUBLISH message bounds check

A malicious or buggy peer could send PUBLISH with chan_len + msg_len values that overflow size_t, bypassing the buffer bounds check. Added an overflow guard to
prevent out-of-bounds memory access.
No-op entry after leader election

A new leader appends a no-op entry in its own term immediately after winning election. This allows entries from prior terms (already replicated to a majority) to be
committed, since Raft requires a current-term entry to be committed before prior-term entries can be safely committed (§5.4.2). Without this, prior-term entries could
remain uncommitted indefinitely if no new proposals arrive.

…ection timeout The leader lease check in clusterRaftCanGrantVote denied pre-votes while last_heartbeat was within the randomized election_timeout [T, 2T). This meant a node whose election timeout fired early could have its pre-vote denied by a peer whose randomized timeout was longer — the peer still considered the leader alive. Use the base cluster-node-timeout for the lease duration. The randomization is for staggering elections, not for lease duration. This ensures that once node-timeout has elapsed since the last heartbeat, all nodes agree the leader is dead and grant pre-votes. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

Remove the unconditional clusterRaftRetryProposals call from cron that ran every 100ms, flooding the leader with duplicate PROPOSE messages. Proposals are now retried only on specific events: - On leader change (AE from new leader): forward to new leader. - On reconnect to same leader (HI): re-forward in case the previous send was lost on the dead connection. - On becoming leader: append locally. This eliminates duplicate log entries from repeated forwarding while still ensuring proposals reach the leader after connectivity changes. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

When the leader has outbound links that haven't received AE_ACK within node_timeout/2, disconnect them so clusterConnectNodes can re-establish fresh connections. This handles half-open connections from peers that were paused or killed without closing the TCP connection. On nodes other than the leader, disconnect stale outbound links after node_timeout/2 if we've never even the HI handshake, meaning the kernel of the peer accepted the connection but it never reached the process (e.g. if it's stopped with SIGSTOP). In other cases, such as between two followers, it's not an error that a connection is idle for a long time, so we they're not included in this check. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

Add overflow guard on chan_len + msg_len before comparing against available buffer space. Without this, a malicious peer could send lengths that overflow to a small sum, bypassing the bounds check. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

coderabbitai · 2026-06-22T11:15:53Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 153067be-e252-4eee-9202-fbf3adc9e498

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-22T11:39:01Z

Codecov Report

❌ Patch coverage is 76.47059% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.58%. Comparing base (f176b3d) to head (cdaf4af).

Files with missing lines	Patch %	Lines
src/cluster_raft.c	76.47%	8 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##           cluster-v2    #4015      +/-   ##
==============================================
+ Coverage       76.47%   76.58%   +0.11%     
==============================================
  Files             166      166              
  Lines           82871    82897      +26     
==============================================
+ Hits            63373    63487     +114     
+ Misses          19498    19410      -88

Files with missing lines	Coverage Δ
src/cluster_raft.c	`66.97% <76.47%> (+1.06%)`	⬆️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

quanyeyang · 2026-06-22T14:05:00Z

@zuiderkwast Thanks for catching this,Relying on the base node-timeout ensures a consistent lease window across the cluster and effectively prevents nodes with longer random timeouts from falsely rejecting valid pre-votes. Great follow-up to PR #3931!

zuiderkwast added 5 commits June 22, 2026 11:55

Raft: no-op entry after leader election

cdaf4af

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

zuiderkwast requested a review from murphyjacob4 June 22, 2026 11:15

github-actions Bot assigned zuiderkwast Jun 22, 2026

zuiderkwast mentioned this pull request Jun 22, 2026

Raft Cluster: log compaction and InstallSnapshot #4008

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raft Cluster: stability fixes#4015

Raft Cluster: stability fixes#4015
zuiderkwast wants to merge 5 commits into
valkey-io:cluster-v2from
zuiderkwast:raft-fix-timeout

zuiderkwast commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

codecov Bot commented Jun 22, 2026

Uh oh!

quanyeyang commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zuiderkwast commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026

Review skipped

Uh oh!

codecov Bot commented Jun 22, 2026

Codecov Report

Uh oh!

quanyeyang commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants