Skip to content

config: persist default store limit for future stores#10900

Open
King-Dylan wants to merge 1 commit into
tikv:masterfrom
King-Dylan:persist-default-store-limit
Open

config: persist default store limit for future stores#10900
King-Dylan wants to merge 1 commit into
tikv:masterfrom
King-Dylan:persist-default-store-limit

Conversation

@King-Dylan

@King-Dylan King-Dylan commented Jun 15, 2026

Copy link
Copy Markdown

What problem does this PR solve?

Issue Number: close #10898

Persist the default store limit used for future stores. Today /pd/api/v1/stores/limit updates existing stores and mutates the process-local default store limit, but the default for future stores is not persisted. After PD/API/scheduling service restart or leader/primary switch, newly joined TiKV stores can fall back to the built-in default value.

What is changed and how does it work?

  • Add schedule.default-store-limit to ScheduleConfig.
  • Update SetAllStoresLimit to persist the default limit as well as updating existing per-store entries.
  • Initialize missing/new store limits from persisted schedule.default-store-limit before falling back to the process default.
  • Keep per-store schedule.store-limit[storeID] as the highest-priority override.
  • Keep the current TiFlash default store limit behavior unchanged.
  • Apply the same default-store-limit logic to MCS scheduling PersistConfig.

The resulting priority is:

  1. per-store schedule.store-limit[storeID]
  2. persisted schedule.default-store-limit
  3. process default

Check List

Tests

  • Unit test
go test ./pkg/schedule/config ./server/config ./pkg/mcs/scheduling/server/config ./server/cluster -run 'Test(DefaultStoreLimitAdjust|ReloadDefaultStoreLimit|PersistConfigDefaultStoreLimit|AddStoreLimitUsesPersistedDefaultStoreLimit|StoreLimitChangeRefreshLimiter)'

Also attempted the existing MCS integration store-limit test:

cd tests/integrations
go test ./mcs/scheduling -run 'TestServerTestSuite/TestStoreLimit'

It failed during build because dashboard embedded assets were not generated in this checkout:

../../pkg/dashboard/uiserver/embedded_assets_rewriter.go:36:26: undefined: assets
../../pkg/dashboard/uiserver/embedded_assets_rewriter.go:37:13: undefined: vfsgen۰FS
../../pkg/dashboard/uiserver/embedded_assets_rewriter.go:39:15: undefined: vfsgen۰CompressedFileInfo
../../pkg/dashboard/uiserver/embedded_assets_rewriter.go:47:9: undefined: assets

Code changes

  • Has configuration change

Side effects

  • No known side effects

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Store limit defaults (add-peer/remove-peer) now reliably use the schedule’s persisted defaults, ensuring consistent behavior across restarts instead of depending on current in-memory global values.
    • Schedule reloading and adjustment now correctly apply both default dimensions (add-peer and remove-peer), including engine-specific defaults (for example, TiFlash), while keeping per-store overrides scoped to the targeted dimension.
    • Deprecated store-balance-rate is now reflected into per-schedule default store limits to prevent unintended downstream changes.

@ti-chi-bot

ti-chi-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 15, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign likidu, zhouqiang-cl for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. dco-signoff: no Indicates the PR's author has not signed dco. labels Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR persists schedule-level default store limits, updates store-limit read/write paths to use them, and adds tests for reload, restart, and future-store initialization behavior.

Changes

Persisted default store limit flow

Layer / File(s) Summary
Schedule default store-limit contract
pkg/schedule/config/config.go
ScheduleConfig now stores default-store-limit, fills it during adjust, migrates deprecated StoreBalanceRate into it, and exposes helpers that resolve effective default add/remove peer rates.
Store-limit persistence and cluster usage
server/config/persist_options.go, pkg/mcs/scheduling/server/config/config.go, server/cluster/cluster.go
Store-limit setters and getters now read and write schedule-level default store limits, and cluster store initialization uses persisted defaults when creating limits for stores without explicit overrides.
Restart and reload coverage
pkg/mcs/scheduling/server/config/config_test.go, server/config/config_test.go, server/cluster/cluster_test.go
Tests verify persisted default store limits survive config reload and simulated restarts, preserve per-store overrides, and are used when adding future stores, including TiFlash handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tikv/pd#10131: Also changes server/cluster/cluster.go around store-limit application in SetAllStoresLimit.
  • tikv/pd#10135: Also touches the store-limit fallback/defaulting path used by GetStoreLimit.

Suggested reviewers

  • rleungx
  • lhy1024
  • niubell
  • bufferflies

Poem

🐇 I tucked new limits into config so tight,
Future stores hop in with defaults just right.
After restart, they still remember the way,
Add-peer and remove-peer both stay in play.
Hoppy tests guard the trail all night.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: persisting default store limits for future stores.
Description check ✅ Passed The description covers the problem, change summary, tests, code-change impact, and release note, matching the template well.
Linked Issues check ✅ Passed The PR implements the requested persisted default store-limit behavior and preserves the per-store, default, process-default priority order.
Out of Scope Changes check ✅ Passed The changes stay focused on persisting and reusing default store limits, with added tests and no obvious unrelated scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ti-chi-bot

ti-chi-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Hi @King-Dylan. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 15, 2026
@King-Dylan King-Dylan force-pushed the persist-default-store-limit branch from b2c0925 to 62c341f Compare June 15, 2026 21:53
@ti-chi-bot ti-chi-bot Bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jun 15, 2026
@King-Dylan King-Dylan marked this pull request as ready for review June 16, 2026 17:28
@ti-chi-bot ti-chi-bot Bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed labels Jun 16, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/schedule/config/config.go`:
- Around line 633-641: The GetDefaultStoreLimit method treats any zero value as
unset and replaces it with process defaults, which breaks persistence of
explicitly configured zero values across restart. Replace the zero-value checks
for AddPeer and RemovePeer fields with a proper mechanism to distinguish between
explicitly set zero values and truly unset values (such as using pointers to the
fields or a different storage representation), so that persisted zero values are
respected and not overwritten by process defaults.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cfc073c5-d240-4769-8674-03ba34dcad72

📥 Commits

Reviewing files that changed from the base of the PR and between 1d3e90f and 62c341f.

📒 Files selected for processing (7)
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/config_test.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • server/cluster/cluster_test.go
  • server/config/config_test.go
  • server/config/persist_options.go

Comment thread pkg/schedule/config/config.go Outdated
@King-Dylan King-Dylan force-pushed the persist-default-store-limit branch from 62c341f to 5e152da Compare June 25, 2026 20:59
Close tikv#10898

Signed-off-by: King-Dylan <702299521@qq.com>
@King-Dylan King-Dylan force-pushed the persist-default-store-limit branch from 5e152da to 6f224a3 Compare June 25, 2026 21:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/schedule/config/config.go`:
- Around line 472-475: The migration logic in config.go currently lets
StoreBalanceRate overwrite an explicitly configured
DefaultStoreLimit/DefaultStoreLimitConfig, which clobbers user-specified
add-peer/remove-peer values. Update the backfill path around StoreBalanceRate
handling so it only populates DefaultStoreLimit when the corresponding
default-store-limit fields are unset, or add explicit validation to reject mixed
configuration; use the existing config migration logic in the affected blocks
near the StoreBalanceRate assignments and the
DefaultStoreLimit/DefaultStoreLimitConfig fields to preserve explicit values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2749e8c6-be72-4a19-b949-c6a6e8ff6353

📥 Commits

Reviewing files that changed from the base of the PR and between 5e152da and 6f224a3.

📒 Files selected for processing (7)
  • pkg/mcs/scheduling/server/config/config.go
  • pkg/mcs/scheduling/server/config/config_test.go
  • pkg/schedule/config/config.go
  • server/cluster/cluster.go
  • server/cluster/cluster_test.go
  • server/config/config_test.go
  • server/config/persist_options.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • server/config/persist_options.go
  • pkg/mcs/scheduling/server/config/config.go
  • server/cluster/cluster.go
  • server/cluster/cluster_test.go

Comment on lines 472 to 475
if c.StoreBalanceRate != 0 {
DefaultStoreLimit = StoreLimit{AddPeer: c.StoreBalanceRate, RemovePeer: c.StoreBalanceRate}
c.DefaultStoreLimit = StoreLimitConfig{AddPeer: c.StoreBalanceRate, RemovePeer: c.StoreBalanceRate}
c.StoreBalanceRate = 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't let store-balance-rate clobber explicit default-store-limit.

If both are present during migration, these assignments overwrite user-specified default-store-limit.add-peer/remove-peer with the deprecated symmetric rate. That makes the new persisted default impossible to roll out incrementally and loses the more specific config. Only backfill from StoreBalanceRate when the corresponding default-store-limit field was not set, or reject the mixed configuration explicitly.

Also applies to: 553-556

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/schedule/config/config.go` around lines 472 - 475, The migration logic in
config.go currently lets StoreBalanceRate overwrite an explicitly configured
DefaultStoreLimit/DefaultStoreLimitConfig, which clobbers user-specified
add-peer/remove-peer values. Update the backfill path around StoreBalanceRate
handling so it only populates DefaultStoreLimit when the corresponding
default-store-limit fields are unset, or add explicit validation to reject mixed
configuration; use the existing config migration logic in the affected blocks
near the StoreBalanceRate assignments and the
DefaultStoreLimit/DefaultStoreLimitConfig fields to preserve explicit values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Persist default store limit for future stores

1 participant