split scatter: stabilize pending and avoid read-pool pressure#10856
split scatter: stabilize pending and avoid read-pool pressure#10856lhy1024 wants to merge 12 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR adds recent-max read CPU tracking to rolling store statistics, models TiKV's unified read pool configuration in ChangesRead Pool Pressure Filter and Split-Scatter Epoch Guards
Sequence Diagram(s)sequenceDiagram
rect rgba(173, 216, 230, 0.5)
Note over scatterRegionWithType,ReadPoolPressureFilter: Internal scatter leader candidate filtering
scatterRegionWithType->>splitScatterReadCPUByStore: StoreKindLoads + recentMaxProvider
splitScatterReadCPUByStore-->>scatterRegionWithType: readCPUByStore map
scatterRegionWithType->>filterAllowedLeaderCandidateStores: candidates + readCPUByStore
filterAllowedLeaderCandidateStores->>ReadPoolPressureFilter: Target(store) for each candidate
ReadPoolPressureFilter-->>filterAllowedLeaderCandidateStores: statusStoreBusy or statusOK
filterAllowedLeaderCandidateStores-->>scatterRegionWithType: filtered allowed candidates
end
sequenceDiagram
rect rgba(144, 238, 144, 0.5)
Note over recordBatch,dispatch: Split-scatter epoch-version guard lifecycle
recordBatch->>pending map: store pending item with regionWaitVersion
collectTopPending->>pending map: read pending under lock
collectTopPending->>collectTopPending: compare live region version vs regionWaitVersion
collectTopPending->>collectTopPending: skip if not yet at minimum version
dispatch->>pending map: recheck live region against regionWaitVersion
dispatch->>dispatch: verify live region version >= regionWaitVersion
dispatch->>pending map: skip dispatch if epoch requirement not met
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
582d591 to
2a8c7d6
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (73.83%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #10856 +/- ##
==========================================
- Coverage 79.14% 79.12% -0.02%
==========================================
Files 536 541 +5
Lines 74033 75044 +1011
==========================================
+ Hits 58594 59380 +786
- Misses 11311 11464 +153
- Partials 4128 4200 +72
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
dea3b17 to
130a58c
Compare
dd50464 to
d08f184
Compare
882b374 to
a5f5e83
Compare
9f27430 to
c24e3d9
Compare
0bd7327 to
68d3151
Compare
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
|
|
||
| // GetUnifiedReadPoolMaxThreadCount returns TiKV unified read pool max thread count. | ||
| func (o *PersistConfig) GetUnifiedReadPoolMaxThreadCount() uint64 { | ||
| return o.GetStoreConfig().GetUnifiedReadPoolMaxThreadCount() |
There was a problem hiding this comment.
No need to configure it? Just get it from tikv?
There was a problem hiding this comment.
It is still sourced from TiKV. Root PD syncs StoreConfig from a TiKV /config endpoint and persists/propagates it; the scheduling service reads the same StoreConfig through the config watcher. This method only exposes the synced value through StoreConfigProvider so the scatterer can use it in both monolithic PD and scheduling-service mode.
| CheckRegionKeys(uint64, uint64) error | ||
| IsEnableRegionBucket() bool | ||
| IsRaftKV2() bool | ||
| GetUnifiedReadPoolMaxThreadCount() uint64 |
There was a problem hiding this comment.
What should we do if not all stores are the same?
There was a problem hiding this comment.
This follows the existing StoreConfig model in PD: PD syncs StoreConfig from one available TiKV and treats it as cluster-level config, as we already do for region split size, region bucket, and raft-kv2. TiKV configs are expected to be homogeneous in a cluster. Supporting heterogeneous read-pool sizes would require a per-store StoreConfig/heartbeat model, which is larger than this PR.
| }{ | ||
| {int(storeStateTombstone), "store-state-tombstone-filter"}, | ||
| {int(filtersLen - 1), "store-state-slow-trend-filter"}, | ||
| {int(readPoolPressure), "read-pool-pressure-filter"}, |
There was a problem hiding this comment.
It should add a new item, not remove the latest.
There was a problem hiding this comment.
Good catch. I will keep the existing latest filter case and add readPoolPressure as a new case instead of replacing it.
| } | ||
|
|
||
| // Target filters stores whose read CPU is close to the unified read-pool limit. | ||
| func (f *readPoolPressureFilter) Target(_ config.SharedConfigProvider, store *core.StoreInfo) *plan.Status { |
There was a problem hiding this comment.
Only pick the target store if the source store is only busy?
There was a problem hiding this comment.
This filter is intentionally target-side only. Internal split-scatter has already been triggered by the load-split flow; here we are not deciding whether the source is busy enough to scatter, but whether a target store still has enough read-pool headroom to receive a new leader.
The target store may not be busy yet, but moving a leader to it can immediately increase its read load. So this filter acts as a proactive brake: once the target read CPU approaches the unified read-pool capacity threshold, we avoid selecting it as the new leader target before it actually becomes busy. Source-side transferability is checked separately by isAllowedLeaderSource. If all candidate targets are under read-pool pressure, no target leader will be selected and the internal scatter will be skipped/retried later.
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com> (cherry picked from commit 36ba64aa01908b19d6c14369a8cf2c40662782e2)
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
87d7734 to
eb243c5
Compare
|
/retest |
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
01702d5 to
e49f324
Compare
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bufferflies The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@lhy1024: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Signed-off-by: lhy1024 <19542290+lhy1024@users.noreply.github.com>
ec6f7cd to
33bf4e1
Compare







What problem does this PR solve?
Issue Number: Ref #10830
This PR fixes load-based split-scatter behavior around pending-region lifecycle and read CPU pressure:
What is changed and how does it work?
Check List
Tests
Test commands run:
go test ./pkg/schedule/checker -run 'TestDispatchSplitScatterKeepsPendingWhenReadCPUIsBalanced|TestDispatchSplitScatterBacksOff|TestCheckSplitScatterRegionsCreatesScatterOperator|TestDispatchSplitScatterKeepsPendingUntilSplitHeartbeat|TestDispatchSplitScatterUsesRangeScatterGroup|TestDispatchSplitScatterKeepsStableGroupWhenRegionSplitsAgain' -count=1go test ./pkg/schedule/scatter -run 'TestInternalScatterSkipsWhenReadCPUIsBalanced|TestInternalScatterAllowsWhenReadCPUIsImbalanced|TestInternalScatterBalancedReadCPUDoesNotBlockSameLeader|TestInternalScatterLeaderFiltersReadPoolPressure|TestInternalScatterPeerSelection|TestInternalScatterLeaderSelection|TestSeedGroupDistributionByRangeAppliesNetChange|TestSeededDistributionSkipsUpdateWhenAddOperatorRejected' -count=1go test ./server/cluster ./pkg/mcs/scheduling/server -run 'TestHandleAskBatchSplit|TestAskBatchSplitRecordsSplitScatterInSchedulingService' -count=1go test ./pkg/schedule/filter ./pkg/schedule/config -run 'TestReadPoolPressureFilterUsesRecentMaxReadCPU|TestStoreConfigEqualChecksReadPool' -count=1git diff --check HEAD~1..HEADSide effects
Release note