Skip to content

tso: keyspace group auto splitting#10932

Open
ystaticy wants to merge 12 commits into
tikv:masterfrom
ystaticy:yaojingyi/auto_split
Open

tso: keyspace group auto splitting#10932
ystaticy wants to merge 12 commits into
tikv:masterfrom
ystaticy:yaojingyi/auto_split

Conversation

@ystaticy

@ystaticy ystaticy commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: Close #10933

What is changed and how does it work?

The TSO keyspace group supports checking the keyspace count automatically every 15 minutes and triggering a split if a certain threshold is reached, preventing the corresponding etcd value from becoming too long.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added periodic automatic keyspace-group splitting that monitors group sizes and triggers a split when they exceed a configurable threshold.
  • Bug Fixes

    • Improved patrol behavior by skipping groups already splitting or merging, ignoring under-replicated oversized groups, and avoiding split attempts when no target ID exists or there’s nothing eligible to move. Also stops immediately when patrol is canceled.
  • Tests

    • Added unit tests covering split triggering (including move ordering and split metadata), below-threshold no-ops, skip/no-op scenarios, missing target ID, empty-move handling, and cancellation behavior.

ystaticy added 6 commits June 23, 2026 14:15
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Jun 23, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi @ystaticy. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jun 23, 2026
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds periodic automatic keyspace-group splitting to GroupManager. Two new constants define the keyspace-count threshold and patrol interval. Bootstrap starts a new goroutine that ticks on the interval, loads all groups, finds the first eligible oversized group not in split/merge state with sufficient replicas, and calls SplitKeyspaceGroupByID. Seven unit tests cover the happy path, below-threshold, skip-states, no-available-ID, no-valid-keyspaces, under-replicated, and context-cancellation edge cases.

Changes

Keyspace-Group Auto-Split Patrol

Layer / File(s) Summary
Constants and Bootstrap goroutine wiring
pkg/keyspace/tso_keyspace_group.go
Adds defaultKeyspaceCountSplitThreshold and autoSplitKeyspaceGroupPatrolInterval constants, and wires a new background goroutine into GroupManager.Bootstrap when the etcd client is non-nil.
Core patrol implementation
pkg/keyspace/tso_keyspace_group.go
Implements patrolKeyspaceGroupSizeForAutoSplit (ticker loop exiting on context/manager close) and doPatrolKeyspaceGroupSizeForAutoSplit (loads groups, computes next group ID, skips groups in split/merge state or under-replicated, builds keyspace move list excluding the default keyspace, calls SplitKeyspaceGroupByID, stops after the first split per tick; failpoint overrides the threshold).
Unit tests and helpers
pkg/keyspace/tso_keyspace_group_test.go
Adds saveKeyspaceGroups, newKeyspaceGroupMembers, and newSequentialKeyspaces helpers, plus seven keyspaceGroupTestSuite test methods exercising: above-threshold split with ordering assertions; below-threshold no-op; skip splitting/merging groups while splitting an eligible group; no available target ID; no valid keyspaces to move; under-replicated group skipped while splitting eligible group; and context cancellation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • JmPotato
  • okJiang

Poem

🐇 Hop, hop, the groups grow tall,
A patrol runs to heed the call,
"Too many keyspaces? Time to split!"
The ticker ticks, the threshold hit.
Seven tests confirm each edge and gate—
The rabbit splits before too late! ✂️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'tso: keyspace group auto splitting' clearly summarizes the main change and follows the repository's title format (pkg: what's changed).
Description check ✅ Passed The PR description includes the linked issue, a clear commit message explaining the feature, and properly filled checklist items (Unit test selected, Has persistent data change marked).
Linked Issues check ✅ Passed The PR implements automatic splitting of TSO keyspace groups with a 15-minute patrol interval and a 40,000 keyspace count threshold, directly addressing issue #10933's requirement to prevent etcd values from becoming too long.
Out of Scope Changes check ✅ Passed All changes are in scope: the implementation adds auto-split logic and constants in the main file, and comprehensive unit tests in the test file, all aligned with the auto-splitting enhancement objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 23, 2026
@ystaticy ystaticy changed the title tso: keyspace group auto split tso: keyspace group auto splitting Jun 23, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/keyspace/tso_keyspace_group.go`:
- Around line 249-255: The doPatrolKeyspaceGroupSizeForAutoSplit method performs
storage reads/writes but lacks access to the caller's context, making
cancellation handling implicit. Add ctx context.Context as the first parameter
to the doPatrolKeyspaceGroupSizeForAutoSplit method definition, pass the ctx
from the calling site when invoking this method, and add a context cancellation
check at the beginning of the method to ensure the split work respects the
caller's context before proceeding with any storage operations.
- Around line 281-313: The code attempts to split keyspace groups without first
verifying they meet the minimum replica requirement, causing
SplitKeyspaceGroupByID to fail and return early, which blocks other eligible
groups from being auto-split. Add a precondition check after the existing
IsSplitting() and IsMerging() checks in the loop to skip groups that have fewer
than mcs.DefaultKeyspaceGroupReplicaCount members, similar to how
under-replicated groups are handled. This should check the group's member count
(likely available on the group object) against the minimum threshold and
continue to the next iteration if the group is under-replicated, preventing the
early return in the error handling of SplitKeyspaceGroupByID.
- Around line 270-274: Change the comparison operator from > to >= in the
condition checking nextID against mcs.MaxKeyspaceGroupCountInUse. The constant
MaxKeyspaceGroupCountInUse represents an exclusive upper bound for valid
keyspace group IDs (0-4095), and using >= ensures the check correctly prevents
assigning ID 4096, which would be rejected by the validation function
checkKeySpaceGroupID and exceed the bounds of the allocators and kgs array
declarations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c1b5e1f-2b3c-40ae-827a-de8b7a11d5fb

📥 Commits

Reviewing files that changed from the base of the PR and between 4ae1408 and 7638221.

📒 Files selected for processing (2)
  • pkg/keyspace/tso_keyspace_group.go
  • pkg/keyspace/tso_keyspace_group_test.go

Comment thread pkg/keyspace/tso_keyspace_group.go Outdated
Comment thread pkg/keyspace/tso_keyspace_group.go
Comment thread pkg/keyspace/tso_keyspace_group.go Outdated
ystaticy added 3 commits June 23, 2026 15:26
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>
Signed-off-by: ystaticy <y_static_y@sina.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/keyspace/tso_keyspace_group_test.go`:
- Line 963: The test assertions on lines 963 and 969 use an incorrect split
index calculation. Since the eligibleKeyspaces slice has
defaultKeyspaceCountSplitThreshold + 1 elements, the split index should be
(defaultKeyspaceCountSplitThreshold + 1) / 2 to match what the actual patrol
logic computes, not defaultKeyspaceCountSplitThreshold / 2. Update both
assertions to use the correct split index formula that accounts for the +1
additional element in the eligible keyspaces count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68453620-4523-4632-a3dc-8cde9ad493cd

📥 Commits

Reviewing files that changed from the base of the PR and between 9dafba9 and 9887a7b.

📒 Files selected for processing (2)
  • pkg/keyspace/tso_keyspace_group.go
  • pkg/keyspace/tso_keyspace_group_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/keyspace/tso_keyspace_group.go

Comment thread pkg/keyspace/tso_keyspace_group_test.go Outdated
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.47059% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.19%. Comparing base (4335bc3) to head (bde6983).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10932      +/-   ##
==========================================
- Coverage   79.26%   79.19%   -0.07%     
==========================================
  Files         540      541       +1     
  Lines       74954    75171     +217     
==========================================
+ Hits        59412    59533     +121     
- Misses      11371    11449      +78     
- Partials     4171     4189      +18     
Flag Coverage Δ
unittests 79.19% <76.47%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: ystaticy <y_static_y@sina.com>
@ystaticy

Copy link
Copy Markdown
Contributor Author

/retest-required

@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@ystaticy: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ystaticy

Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot

ti-chi-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@ystaticy: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@JmPotato JmPotato removed the needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. label Jun 25, 2026
@JmPotato JmPotato added the ok-to-test Indicates a PR is ready to be tested. label Jun 25, 2026
Comment thread pkg/keyspace/tso_keyspace_group.go Outdated

@bufferflies bufferflies left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Signed-off-by: ystaticy <y_static_y@sina.com>
@ti-chi-bot

ti-chi-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@ystaticy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review bde6983 link true /test pull-error-log-review
pull-unit-test-next-gen-2 bde6983 link true /test pull-unit-test-next-gen-2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot

ti-chi-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bufferflies

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 26, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[LGTM Timeline notifier]

Timeline:

  • 2026-06-26 02:56:32.884425972 +0000 UTC m=+261876.547651485: ☑️ agreed by bufferflies.

@ti-chi-bot ti-chi-bot Bot added the approved label Jun 26, 2026
keyspacesToMove := make([]uint32, 0, count-splitIdx)
for i := splitIdx; i < count; i++ {
kid := group.Keyspaces[i]
if kid == constant.DefaultKeyspaceID {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In NextGen, the protected bootstrap keyspace is SystemKeyspaceID, not 0. If SystemKeyspaceID is included in this tail half, every patrol attempt fails and auto-split remains stuck.

maxID = g.ID
}
}
nextID := maxID + 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can make auto-splitting stop when a high-numbered group exists, even if lower group IDs are still free. For example, group 4095 blocks future auto-splits despite available IDs below it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. needs-1-more-lgtm Indicates a PR needs 1 more LGTM. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tso: keyspace group auto splitting

4 participants