server: skip campaign until region syncer history phase is done by tharanga · Pull Request #10844 · tikv/pd

tharanga · 2026-06-08T20:41:50Z

What problem does this PR solve?

Issue Number: Close #10843

When a node with an empty region cache joins the PD cluster (either a new node or an existing node that lost its LevelDB state), it can become the leader by entering the campaign loop without waiting for asynchronous region sync to complete. When there are hundreds of thousands of regions, the new leader is unable to rebuild the region cache in a timely manner, causing a spike in GetRegion requests, increasing region heartbeat and GetRegion latency, and ultimately resulting in query errors and elevated latencies.

What is changed and how does it work?

When a new member joins the PD cluster, it waits until the historical region sync phase is done before start campaigning for the leadership.

server: skip campaign until region syncer history phase is done

Check List

[x] - Unit and integration tests

Tests

Unit tests (no failpoints needed — run directly)

  # pkg/storage/endpoint — RegionSyncerStorage Load/Save (3 tests)
  go test ./pkg/storage/endpoint/ -count=1 -v \
    -run 'TestRegionSyncerCommittedRegionCount'

  # pkg/syncer — historySynced durable reload + catch-up commit/rollback (2 tests)
  go test ./pkg/syncer/ -count=1 -v \
    -run 'TestHistorySyncedInitFromDurableState|TestCatchupCommitRollback'

 Integration tests (require failpoints enabled)

  These call failpoint.Enable(...), so the failpoints must be rewritten in first — otherwise the inject sites are no-ops and the tests won't behave
  correctly.

  # enable once for the failpoint-dependent runs
  make failpoint-enable

  # tests/server/region_syncer — unsynced member must not win leadership
  go test ./tests/server/region_syncer/ -count=1 -v -timeout 300s \
    -run 'TestUnsyncedMemberRefusesLeadership'

  # cluster-formation fuzzer (opt-in via env; honors PD_FUZZ_SEED / PD_FUZZ_ITERATIONS)
  PD_RUN_FUZZ=1 go test ./tests/server/region_syncer/ -count=1 -v -timeout 1200s \
    -run 'TestClusterFormationFuzz'

  # restore when done
  make failpoint-disable

Code changes

Has persistent data change
- New storage endpoint to save region counts to etcd

Side effects

Increased code complexity
-- Increasing the state space of node/cluster startup/bootstrap state machine

Release note

When a new member joins the PD cluster, it waits until the historical region sync phase is done before start campaigning for the leadership. This avoids

Summary by CodeRabbit

New Features
- Persisted region-sync state (committed region count & history sync markers) and sticky sync status to survive restarts.
- Protocol sends explicit end-of-history markers to improve catch-up behavior.
- Leadership gate prevents promotion until region sync state is safe.
Bug Fixes
- Prevents stale history offsets on leadership changes by rolling back uncommitted catch-up state.
Tests
- New tests covering persistence, catch-up commit/rollback, and leadership eligibility.

ti-chi-bot · 2026-06-08T20:41:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rleungx for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-06-08T20:42:01Z

Hi @tharanga. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot · 2026-06-08T20:42:01Z

Welcome @tharanga!

It looks like this is your first PR to tikv/pd 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to tikv/pd. 😃

coderabbitai · 2026-06-08T20:42:04Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Region syncer adds durable committed-region-count storage, in-memory catch-up with recordNoPersist/commit/rollback and explicit end-of-history markers, sticky sync-session flags, periodic leader publishing of committed counts, and a gate preventing unsynced members from campaigning when durable committed count is non-zero.

Changes

Region Syncer History Catch-Up and Leadership Election Gating

Layer / File(s)	Summary
Storage contract and path for committed region count `pkg/storage/endpoint/region_syncer.go`, `pkg/storage/storage.go`, `pkg/utils/keypath/absolute_key_path.go`	`RegionSyncerStorage` interface and `StorageEndpoint` implementations load/save committed region count as base-10 uint64 strings; keypath helper formats durable etcd key under cluster namespace.
Storage tests for committed region count `pkg/storage/endpoint/region_syncer_test.go`	In-memory `StorageEndpoint` tests: absent-key loads as zero with no error; round-trip preserves uint64↔string (including `0` and `math.MaxUint64`); corrupt values return parse errors.
History buffer catch-up (in-memory apply / commit / rollback) `pkg/syncer/history_buffer.go`, `pkg/syncer/history_buffer_test.go`	`recordNoPersist` appends in-memory without persisting index; `commit` persists next index; `rollback` clears in-memory buffer and reloads persisted index; `TestCatchupCommitRollback` validates atomicity and invariants.
Client syncer: sync-session state, end-of-history, and branched recording `pkg/syncer/client.go`, `pkg/syncer/client_test.go`	Adds sticky atomic `attemptedSync` and `historySynced` with getters; treats empty `regions` response as end-of-history (commits and sets flag); uses `recordNoPersist` during catch-up and `record` after sync; rolls back on stop when sync incomplete; `TestHistorySyncedInitFromDurableState` verifies initialization from durable state.
Server syncer: committed region count periodic publishing `pkg/syncer/server.go`	Initializes `historySynced` from durable state; creates `committedCountTicker` in `RunServer` to periodically publish/persist leader's committed region count via `SaveRegionSyncerCommittedRegionCount` when it changes; stops ticker on shutdown; sync methods now send explicit end-of-history markers.
Server syncer tests updated for end-of-history markers `pkg/syncer/server_test.go`	Adjusts mocked channel sizing and response-count expectations to include explicit empty end-of-history marker responses and verifies ordering/content.
Leader election gating based on syncer catch-up `server/server.go`, `tests/server/region_syncer/region_syncer_test.go`	`canCampaignAsRegionSyncerCaughtUp` prevents leadership campaigning when region storage is enabled, syncer attempted sync but history not synced, and durable committed-region count > 0; `leaderLoop` waits until gate passes; `TestUnsyncedMemberRefusesLeadership` regression test verifies a newly joined unsynced PD does not win leadership while blocked by failpoint.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tikv/pd#10672: Modifies syncer response/handling code in pkg/syncer/client.go/pkg/syncer/server.go, overlapping with end-of-history and history handling changes.
tikv/pd#10733: Changes handleRegionSyncResponse around end-of-history and shutdown flows that intersect this PR's history-sync logic.
tikv/pd#10716: Alters history buffering and persistence semantics in pkg/syncer/history_buffer.go, related to this PR's commit/rollback/recordNoPersist work.

Suggested labels

lgtm, approved, needs-cherry-pick-release-8.5

Suggested reviewers

okJiang
rleungx
bufferflies

Poem

🐰 I buffered tales in memory tight,
Saved the count by candlelight,
When empty echoes called the end,
I committed, then could safely send.
Now leaders wait until we're right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: preventing campaign/leadership acquisition until region syncer history phase completes.
Description check	✅ Passed	The description provides issue reference, problem statement, solution summary, and comprehensive test instructions, closely following the repository template.
Linked Issues check	✅ Passed	Code changes directly address issue `#10843` objectives: preventing empty-region-cache nodes from becoming leader by blocking campaign until region syncer history phase completes.
Out of Scope Changes check	✅ Passed	All changes are scoped to the issue: storage endpoint for region counts, history sync state tracking, commit/rollback semantics, leadership campaign gating, and corresponding tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/syncer/server.go`:
- Around line 507-520: The end-of-history empty SyncRegionResponse is being sent
too early using lastIndex (snapshot row count), which causes
handleRegionSyncResponse to commit the wrong index; change the logic so the
server sends exactly one trailing end-of-history marker after retained history
replay completes (not before), using the correct index from
retainedRecordsFrom/retainIndex (or nextIndex when there are no retained
records) rather than lastIndex; locate the marker creation and stream.Send calls
in pkg/syncer/server.go (the block that builds pdpb.SyncRegionResponse with
StartIndex: uint64(lastIndex)) and defer or remove them so that only
syncHistoryRecords emits the final marker with the proper
StartIndex/nextIndex/retainIndex, and apply the same fix to the duplicate block
referenced at the other similar section.

In `@server/server.go`:
- Around line 1783-1789: The current branch in the startup gate returns true
when s.storage.LoadRegionSyncerCommittedRegionCount() fails, which can allow a
stale/empty follower to campaign; change the error handling so that when
LoadRegionSyncerCommittedRegionCount() returns an error (i.e., committed count
cannot be read), the method blocks campaigning by returning false (do not open
the gate), while keeping the existing warning log (use the same log.Warn with
errs.ZapError(err)); ensure this logic sits in the same code path guarded by
HasAttemptedSync() and IsHistorySynced() so that a read/parse failure prevents
using s.electionClient when s.client metadata lookup failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 884e4d58-fdff-435e-8114-76151c8fee64

📥 Commits

Reviewing files that changed from the base of the PR and between 9f1c47b and e3001b0.

📒 Files selected for processing (12)

pkg/storage/endpoint/region_syncer.go
pkg/storage/endpoint/region_syncer_test.go
pkg/storage/storage.go
pkg/syncer/client.go
pkg/syncer/client_test.go
pkg/syncer/history_buffer.go
pkg/syncer/history_buffer_test.go
pkg/syncer/server.go
pkg/syncer/server_test.go
pkg/utils/keypath/absolute_key_path.go
server/server.go
tests/server/region_syncer/region_syncer_test.go

okJiang · 2026-06-11T08:50:06Z

/ok-to-test

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/syncer/server.go (1)

264-319: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Publish the committed region count before waiting for the first tick.

The first durable write is delayed by committedRegionCountInterval. If this leader dies during that first second, the stored committed count is still 0, so the new campaign gate can misclassify an unsynced follower as already caught up. Publish once synchronously when RunServer starts, then keep the ticker for refreshes.

Suggested fix

 	keepAliveTicker := time.NewTicker(syncerKeepAliveInterval)
 	shrinkTicker := time.NewTicker(historyBufferShrinkInterval)
 	committedCountTicker := time.NewTicker(committedRegionCountInterval)
 	// -1 forces an initial publish on the first tick.
 	lastPublishedRegionCount := -1
+	publishCommittedRegionCount := func() {
+		if count := s.server.GetBasicCluster().GetTotalRegionCount(); count != lastPublishedRegionCount {
+			if err := s.server.GetStorage().SaveRegionSyncerCommittedRegionCount(uint64(count)); err != nil {
+				log.Warn("failed to persist committed region count", errs.ZapError(err))
+			} else {
+				lastPublishedRegionCount = count
+			}
+		}
+	}
+	publishCommittedRegionCount()
@@
 		case <-committedCountTicker.C:
-			// Publish the region count this leader is serving so that a member
-			// that has not finished a history sync can still decide, after this
-			// leader is gone, whether it is caught up enough to campaign. Only
-			// the leader runs RunServer, so only the leader writes this.
-			if count := s.server.GetBasicCluster().GetTotalRegionCount(); count != lastPublishedRegionCount {
-				if err := s.server.GetStorage().SaveRegionSyncerCommittedRegionCount(uint64(count)); err != nil {
-					log.Warn("failed to persist committed region count", errs.ZapError(err))
-				} else {
-					lastPublishedRegionCount = count
-				}
-			}
+			// Publish the region count this leader is serving so that a member
+			// that has not finished a history sync can still decide, after this
+			// leader is gone, whether it is caught up enough to campaign. Only
+			// the leader runs RunServer, so only the leader writes this.
+			publishCommittedRegionCount()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/syncer/server.go` around lines 264 - 319, The committed region count
isn't persisted until the first committedCountTicker tick, which can leave
storage at 0 if the leader dies early; call the same save logic once
synchronously before entering the for/select loop: compute count via
s.server.GetBasicCluster().GetTotalRegionCount(), call
s.server.GetStorage().SaveRegionSyncerCommittedRegionCount(uint64(count)),
handle the error like in the ticker branch (log Warn on error, otherwise set
lastPublishedRegionCount = count), and then proceed to start the existing
committedCountTicker for periodic refreshes.

♻️ Duplicate comments (1)

pkg/syncer/server.go (1)

502-510: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the explicit end-of-history marker after replay.

The client-side state machine only flips historySynced when it sees the empty end-of-history response. These paths return after replaying records, so a fully caught-up follower stays "unsynced" until a later keepalive or region update. If the leader disappears first, that member remains blocked from campaigning even though it already consumed the full history. This regresses the earlier end-of-history marker fix.

Suggested fix

 func (*RegionSyncer) syncHistoryRecordsLocked(startIndex uint64, records []*core.RegionInfo, stream *regionSyncStream) error {
 	for start := 0; start < len(records); start += maxSyncRegionBatchSize {
 		end := min(start+maxSyncRegionBatchSize, len(records))
 		if err := stream.sendStreamIfOpen(buildSyncRegionResponse(startIndex+uint64(start), records[start:end])); err != nil {
 			return err
 		}
 	}
-	return nil
+	return stream.sendStreamIfOpen(&pdpb.SyncRegionResponse{
+		Header:     &pdpb.ResponseHeader{ClusterId: keypath.ClusterID()},
+		StartIndex: startIndex + uint64(len(records)),
+	})
 }
@@
 	if len(records) > 0 {
 		if err := s.syncHistoryRecordsLocked(syncStartIndex, records, syncStream); err != nil {
 			return err
 		}
 		syncStream.advanceSendIndexLocked(len(records))
+	} else {
+		if err := syncStream.sendStreamIfOpen(&pdpb.SyncRegionResponse{
+			Header:     &pdpb.ResponseHeader{ClusterId: keypath.ClusterID()},
+			StartIndex: nextIndex,
+		}); err != nil {
+			return err
+		}
 	}
 	return nil
 }

Also applies to: 575-587

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/syncer/server.go` around lines 502 - 510, The loop in
syncHistoryRecordsLocked sends all batched history chunks but never emits the
explicit empty end-of-history marker the client expects to flip historySynced;
after the for loop return an explicit empty end-of-history response by calling
stream.sendStreamIfOpen with buildSyncRegionResponse for the next index and an
empty records slice (propagate any error), and apply the same fix to the other
similar replay path mentioned so the client receives the empty marker and can
set historySynced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/syncer/server.go`:
- Around line 264-319: The committed region count isn't persisted until the
first committedCountTicker tick, which can leave storage at 0 if the leader dies
early; call the same save logic once synchronously before entering the
for/select loop: compute count via
s.server.GetBasicCluster().GetTotalRegionCount(), call
s.server.GetStorage().SaveRegionSyncerCommittedRegionCount(uint64(count)),
handle the error like in the ticker branch (log Warn on error, otherwise set
lastPublishedRegionCount = count), and then proceed to start the existing
committedCountTicker for periodic refreshes.

---

Duplicate comments:
In `@pkg/syncer/server.go`:
- Around line 502-510: The loop in syncHistoryRecordsLocked sends all batched
history chunks but never emits the explicit empty end-of-history marker the
client expects to flip historySynced; after the for loop return an explicit
empty end-of-history response by calling stream.sendStreamIfOpen with
buildSyncRegionResponse for the next index and an empty records slice (propagate
any error), and apply the same fix to the other similar replay path mentioned so
the client receives the empty marker and can set historySynced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a8c0535-6550-491a-a400-447182505657

📥 Commits

Reviewing files that changed from the base of the PR and between 9f73d32 and 9a5c1e6.

📒 Files selected for processing (5)

pkg/syncer/client.go
pkg/syncer/client_test.go
pkg/syncer/history_buffer.go
pkg/syncer/history_buffer_test.go
pkg/syncer/server.go

🚧 Files skipped from review as they are similar to previous changes (3)

pkg/syncer/history_buffer_test.go
pkg/syncer/history_buffer.go
pkg/syncer/client.go

tharanga · 2026-06-15T22:19:06Z

/retest

tharanga · 2026-06-16T00:37:43Z

/retest

codecov · 2026-06-16T00:48:46Z

Codecov Report

❌ Patch coverage is 88.61789% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.20%. Comparing base (6e3d72f) to head (2f8afc1).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10844      +/-   ##
==========================================
+ Coverage   79.14%   79.20%   +0.05%     
==========================================
  Files         536      541       +5     
  Lines       74033    74566     +533     
==========================================
+ Hits        58594    59057     +463     
- Misses      11311    11347      +36     
- Partials     4128     4162      +34

Flag	Coverage Δ
unittests	`79.20% <88.61%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tharanga · 2026-06-17T00:08:53Z

@okJiang thoughts on #10843 and this fix? Appreciate your feedback!

okJiang · 2026-06-22T13:21:48Z

Thanks for the PR. I checked the latest master, and the underlying issue still exists: region sync completion is not used as a guard before PD leader campaign, so a newly joined member with empty region metadata can still become leader.

The fix direction looks right, but I think this PR is too broad for #10843 as-is.

The main concern is the 10s grace fallback. With the default timing, other healthy members may only try to recover etcd leadership after roughly 40s, so an unsynced member can still campaign after the 10s grace and become PD leader with empty or partial region metadata. The new regression test observes only a 3s window, so it does not prove the behavior after the grace period.

I suggest splitting this into a smaller first fix:

Keep a sticky "history sync completed" signal.
Gate PD leader campaign on that signal when region storage is enabled and sync has been attempted.
Preserve the bootstrap/no-region path.
Add a test that covers the behavior after the fallback window, or remove/adjust the fallback so an unsynced member does not win before a caught-up member can take over.

I would also avoid adding the new etcd-persisted committed-region-count key in the first fix. The core issue is local election eligibility: whether this member has completed region sync before campaigning. That should preferably be derived from the sync completion signal and local durable sync state, not from a new cluster-level marker.

If we still need a cluster-level etcd marker, I think it should be justified and reviewed separately, because it adds upgrade/rollback semantics and another source of truth.

The leader-transfer target preference/refusal logic also looks like a separate policy change. It may be useful, but I would prefer it in a follow-up PR after the core election guard is fixed.

Also, the branch currently conflicts with master and CI is failing, so it needs a rebase before final review.

Signed-off-by: tharanga <tharanga@gmail.com> re-pro of a new PD node acquiring the leadership with no regions fixing the test to fail without any changes Signed-off-by: tharanga <tharanga@gmail.com> server: skip campaign until region syncer history phase done Signed-off-by: tharanga <tharanga@gmail.com>

Signed-off-by: tharanga <tharanga@gmail.com>

tharanga · 2026-06-23T00:50:24Z

Thanks for the PR. I checked the latest master, and the underlying issue still exists: region sync completion is not used as a guard before PD leader campaign, so a newly joined member with empty region metadata can still become leader.

The fix direction looks right, but I think this PR is too broad for #10843 as-is.

The main concern is the 10s grace fallback. With the default timing, other healthy members may only try to recover etcd leadership after roughly 40s, so an unsynced member can still campaign after the 10s grace and become PD leader with empty or partial region metadata. The new regression test observes only a 3s window, so it does not prove the behavior after the grace period.

I suggest splitting this into a smaller first fix:

Keep a sticky "history sync completed" signal.

Gate PD leader campaign on that signal when region storage is enabled and sync has been attempted.

Preserve the bootstrap/no-region path.

Add a test that covers the behavior after the fallback window, or remove/adjust the fallback so an unsynced member does not win before a caught-up member can take over.

I would also avoid adding the new etcd-persisted committed-region-count key in the first fix. The core issue is local election eligibility: whether this member has completed region sync before campaigning. That should preferably be derived from the sync completion signal and local durable sync state, not from a new cluster-level marker.

If we still need a cluster-level etcd marker, I think it should be justified and reviewed separately, because it adds upgrade/rollback semantics and another source of truth.

The leader-transfer target preference/refusal logic also looks like a separate policy change. It may be useful, but I would prefer it in a follow-up PR after the core election guard is fixed.

Also, the branch currently conflicts with master and CI is failing, so it needs a rebase before final review.

@okJiang thanks for the feedback. I've rebased my branch to fix conflicts, and preped to cleanly remove additional changes you pointed out. I will address your comments, and ping you once it's ready to review.

tharanga · 2026-06-25T00:58:43Z

I would also avoid adding the new etcd-persisted committed-region-count key in the first fix. The core issue is local election eligibility: whether this member has completed region sync before campaigning. That should preferably be derived from the sync completion signal and local durable sync state, not from a new cluster-level marker.

If we still need a cluster-level etcd marker, I think it should be justified and reviewed separately, because it adds upgrade/rollback semantics and another source of truth.

@okJiang the behind-case gate is purely local (historySynced), but the bootstrap/empty cluster scenario is provably not locally decidable because in the region-storage mode, the region data is per-node LevelDB with no etcd region count. It looks like stores/bootstrap markers can't distinguish a freshly-bootstrapped cluster from a behind follower. Is there another way to address the bootstrap/empty cluster scenario without etcd-persisted key? (a boolean flag is sufficient)

tharanga · 2026-06-25T06:50:13Z

/retest

tharanga · 2026-06-25T07:21:10Z

/retest

Signed-off-by: tharanga <tharanga@gmail.com>

ti-chi-bot · 2026-06-26T18:31:41Z

@tharanga: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test-next-gen-2	`1933192`	link	true	`/test pull-unit-test-next-gen-2`
pull-unit-test-next-gen-3	`1933192`	link	true	`/test pull-unit-test-next-gen-3`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the dco. labels Jun 8, 2026

ti-chi-bot Bot added contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jun 8, 2026

ti-chi-bot Bot added the first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. label Jun 8, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread pkg/syncer/server.go Outdated

Comment thread server/server.go Outdated

ti-chi-bot Bot added the do-not-merge/needs-triage-completed label Jun 9, 2026

ti-chi-bot Bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jun 11, 2026

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

tharanga added 2 commits June 22, 2026 11:01

fixing region syncer commit index to align with the syncer

007a7dd

Signed-off-by: tharanga <tharanga@gmail.com>

tharanga force-pushed the tharanga/new_member_region_catchup branch from 069ce6f to 73f1b58 Compare June 22, 2026 19:04

tharanga force-pushed the tharanga/new_member_region_catchup branch from 73f1b58 to 007a7dd Compare June 24, 2026 22:02

fixing the DocString coverage warning

1933192

Signed-off-by: tharanga <tharanga@gmail.com>

Uh oh!

Conversation

tharanga commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Jun 8, 2026

Uh oh!

ti-chi-bot Bot commented Jun 8, 2026

Uh oh!

ti-chi-bot Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

okJiang commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tharanga commented Jun 15, 2026

Uh oh!

tharanga commented Jun 16, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tharanga commented Jun 17, 2026

Uh oh!

okJiang commented Jun 22, 2026

Uh oh!

tharanga commented Jun 23, 2026

Uh oh!

tharanga commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tharanga commented Jun 25, 2026

Uh oh!

tharanga commented Jun 25, 2026

Uh oh!

ti-chi-bot Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tharanga commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading

tharanga commented Jun 25, 2026 •

edited

Loading