meta-service-group: add status#10904
Conversation
* add enable status Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> * Update pkg/keyspace/meta_service_group.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update pkg/keyspace/meta_service_group.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/server/apiv2/handlers/testutil.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix typo Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> * fix linter issue Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> * Update tools/pd-ctl/pdctl/command/meta_service_group_command.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix copilot suggestion Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> * fix ut Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> --------- Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> (cherry picked from commit 5be7eae) Signed-off-by: bufferflies <1045931706@qq.com> # Conflicts: # pkg/keyspace/keyspace_test.go # pkg/keyspace/meta_service_group.go # pkg/keyspace/meta_service_group_test.go # pkg/storage/meta_service_group_test.go # pkg/utils/keypath/key_path.go # server/apiv2/handlers/meta_service_group.go # tests/server/apiv2/handlers/meta_service_group_test.go # tests/server/apiv2/handlers/testutil.go # tools/pd-ctl/pdctl/command/meta_service_group_command.go
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughReplaces meta-service-group assignment-count storage with persisted status records containing ChangesMeta-service group status migration
Sequence Diagram(s)sequenceDiagram
participant Client
participant MetaServiceGroupHTTP
participant MetaServiceGroupManager
participant StorageEndpoint
Client->>MetaServiceGroupHTTP: PATCH /meta-service-groups/:id/status
MetaServiceGroupHTTP->>MetaServiceGroupManager: PatchStatus(ctx, id, patch)
MetaServiceGroupManager->>StorageEndpoint: RunInTxn
MetaServiceGroupManager->>StorageEndpoint: LoadMetaServiceGroupStatus(txn, ids)
StorageEndpoint-->>MetaServiceGroupManager: status map
MetaServiceGroupManager->>StorageEndpoint: SaveMetaServiceGroupStatus(txn, id, status)
MetaServiceGroupHTTP->>MetaServiceGroupManager: buildMetaServiceGroupStatus(ctx)
MetaServiceGroupManager->>StorageEndpoint: GetGroups() and LoadMetaServiceGroupStatus()
StorageEndpoint-->>MetaServiceGroupHTTP: groups and statuses
MetaServiceGroupHTTP-->>Client: status response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.23.0)server/apiv2/handlers/meta_service_group.go┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.13][ERROR]: unable to find a config; path pkg/keyspace/meta_service_group.go┌──────────────┐ �[32m✔�[39m �[1mOpengrep OSS�[0m [00.13][ERROR]: unable to find a config; path Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 75-79: The MetaServiceGroupStatusPatch struct uses the JSON field
name assigned_count for the AssignedCount field, but the MetaServiceGroupStatus
exposes this field as assignment_count. This mismatch prevents API clients from
updating the count using the status shape. Change the JSON tag on the
AssignedCount field in MetaServiceGroupStatusPatch to assignment_count to align
with the status struct. Additionally, add validation logic to reject negative
values before persisting the patch, since negative counts corrupt the balancing
logic. Be aware that callers constructing MetaServiceGroupStatusPatch instances
may need updating to use the corrected field name.
In `@pkg/storage/endpoint/meta_service_group.go`:
- Around line 62-70: In the loadMetaServiceGroupStatus function, when statusVal
is empty (indicating the /status key is missing), add a fallback mechanism
before returning the zero-value status. Before the return statement in the
statusVal == "" condition, attempt to load the legacy assignment-count key using
the transaction (txn). If the legacy key exists, migrate its integer value to
populate the assignment_count field of the status object and set enabled to
true. Only return the zero-value status with assignment_count:0 and
enabled:false if the legacy key is also not found, ensuring existing persisted
counts from the old storage model are preserved during the upgrade.
In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 187-195: The PatchMetaServiceGroupStatus handler needs to validate
the patch payload before persisting it and properly classify errors. After the
ShouldBindJSON check succeeds, add validation logic to ensure the patch data is
valid (such as checking that assigned_count is not negative), and return a 400
Bad Request if validation fails. Additionally, when the manager.PatchStatus call
fails, distinguish between client errors (invalid patch data) and server errors
instead of treating all failures as 500 Internal Server Error. Return 400 for
validation or client-side errors and 500 only for actual server-side failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4f09485-f513-4fe7-917b-6ff801605390
📒 Files selected for processing (9)
pkg/keyspace/meta_service_group.gopkg/keyspace/meta_service_group_test.gopkg/storage/endpoint/meta_service_group.gopkg/storage/meta_service_group_test.gopkg/utils/keypath/absolute_key_path.goserver/apiv2/handlers/meta_service_group.gotests/server/apiv2/handlers/meta_service_group_test.gotests/server/apiv2/handlers/testutil.gotools/pd-ctl/pdctl/command/meta_service_group_command.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
tests/server/apiv2/handlers/meta_service_group_test.go (1)
78-100: 💤 Low valuePotential logic discrepancy in
collectStatussettingEnabled=true.The
collectStatushelper function at line 89 setsEnabled: truefor all collected statuses. However, this represents the expected state after keyspaces have been created and assigned to enabled groups. This is correct in the context of the test since groups are enabled at lines 108-112 before keyspaces are created at line 114.The logic is correct but could be clearer. Consider adding a comment explaining that this helper assumes all collected groups have been enabled before assignment, to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/server/apiv2/handlers/meta_service_group_test.go` around lines 78 - 100, Add a clarifying comment in the collectStatus function above or near the line where Enabled is set to true (within the initialization of the handlers.MetaServiceGroupStatus struct) to explicitly document the assumption that this helper function expects all collected meta service groups to have been enabled before keyspaces are assigned to them. This will make the intent clearer to future readers and prevent confusion about why Enabled is always set to true.tools/pd-ctl/pdctl/command/meta_service_group_command.go (1)
166-172: 💤 Low valueConsider validating non-negative assignment count client-side.
strconv.Atoiaccepts negative integers. While the server should also validate this, adding a client-side check provides better UX by catching errors early with a clear message.Proposed fix
func newSetMetaServiceGroupAssignmentCountFunc(cmd *cobra.Command, args []string) { groupID := strings.TrimSpace(args[0]) assignmentCount, err := strconv.Atoi(args[1]) if err != nil { cmd.PrintErrln("Invalid value for assignment count, must be an integer:", err) return } + if assignmentCount < 0 { + cmd.PrintErrln("Invalid value for assignment count, must be non-negative") + return + } patch := &keyspace.MetaServiceGroupStatusPatch{🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go` around lines 166 - 172, After successfully parsing the assignment count with strconv.Atoi in the newSetMetaServiceGroupAssignmentCountFunc function, add a validation check to ensure assignmentCount is not negative. If assignmentCount is less than zero, use cmd.PrintErrln to display an error message indicating that the assignment count must be a non-negative integer, then return early. This provides better user experience by catching invalid input on the client side before sending it to the server.pkg/keyspace/meta_service_group.go (1)
149-161: ⚡ Quick winRedundant status load within the same transaction.
findMinMetaGroup(called at line 151) already loads the entire status map viaLoadMetaServiceGroupStatus. Lines 155-158 load the same data again within the same transaction. Consider refactoringfindMinMetaGroupto return both the selected group and the status map, allowing reuse here.♻️ Suggested refactor
-func (m *MetaServiceGroupManager) findMinMetaGroup(txn kv.Txn) (string, error) { +func (m *MetaServiceGroupManager) findMinMetaGroup(txn kv.Txn) (string, map[string]*endpoint.MetaServiceGroupStatus, error) { statusMap, err := m.store.LoadMetaServiceGroupStatus(txn, m.metaServiceGroups) if err != nil { - return "", err + return "", nil, err } minCount := math.MaxInt var assignedGroup string for currentGroup, status := range statusMap { if status.Enabled && status.AssignmentCount < minCount { minCount = status.AssignmentCount assignedGroup = currentGroup } } if assignedGroup == "" { - return "", errNoAvailableMetaServiceGroups + return "", nil, errNoAvailableMetaServiceGroups } - return assignedGroup, nil + return assignedGroup, statusMap, nil }Then in
AssignToGroup:- assignedGroup, err = m.findMinMetaGroup(txn) + var statusMap map[string]*endpoint.MetaServiceGroupStatus + assignedGroup, statusMap, err = m.findMinMetaGroup(txn) if err != nil { return err } - statusMap, err := m.store.LoadMetaServiceGroupStatus(txn, m.metaServiceGroups) - if err != nil { - return err - } status := statusMap[assignedGroup]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/keyspace/meta_service_group.go` around lines 149 - 161, The code is redundantly loading the status map twice within the same transaction: once inside findMinMetaGroup and again at lines 155-158. Refactor the findMinMetaGroup method to return both the assigned group and the status map it already loads internally. Then update the call to findMinMetaGroup in the transaction to receive both return values, and remove the redundant m.store.LoadMetaServiceGroupStatus call. Finally, use the status map returned from findMinMetaGroup instead of loading it again.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 173-178: The assignment count decrement for the old group ID in
the statusMap has no validation to prevent it from going negative. Before
decrementing status.AssignmentCount for the oldGroupID in the statusMap, add a
check to ensure the count is greater than 0 to prevent it from becoming
negative, which would corrupt load balancing behavior in findMinMetaGroup. Only
perform the decrement and subsequent SaveMetaServiceGroupStatus call if the
count is positive.
---
Nitpick comments:
In `@pkg/keyspace/meta_service_group.go`:
- Around line 149-161: The code is redundantly loading the status map twice
within the same transaction: once inside findMinMetaGroup and again at lines
155-158. Refactor the findMinMetaGroup method to return both the assigned group
and the status map it already loads internally. Then update the call to
findMinMetaGroup in the transaction to receive both return values, and remove
the redundant m.store.LoadMetaServiceGroupStatus call. Finally, use the status
map returned from findMinMetaGroup instead of loading it again.
In `@tests/server/apiv2/handlers/meta_service_group_test.go`:
- Around line 78-100: Add a clarifying comment in the collectStatus function
above or near the line where Enabled is set to true (within the initialization
of the handlers.MetaServiceGroupStatus struct) to explicitly document the
assumption that this helper function expects all collected meta service groups
to have been enabled before keyspaces are assigned to them. This will make the
intent clearer to future readers and prevent confusion about why Enabled is
always set to true.
In `@tools/pd-ctl/pdctl/command/meta_service_group_command.go`:
- Around line 166-172: After successfully parsing the assignment count with
strconv.Atoi in the newSetMetaServiceGroupAssignmentCountFunc function, add a
validation check to ensure assignmentCount is not negative. If assignmentCount
is less than zero, use cmd.PrintErrln to display an error message indicating
that the assignment count must be a non-negative integer, then return early.
This provides better user experience by catching invalid input on the client
side before sending it to the server.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: beec8daa-b422-480f-885e-71d385de0445
📒 Files selected for processing (9)
pkg/keyspace/meta_service_group.gopkg/keyspace/meta_service_group_test.gopkg/storage/endpoint/meta_service_group.gopkg/storage/meta_service_group_test.gopkg/utils/keypath/absolute_key_path.goserver/apiv2/handlers/meta_service_group.gotests/server/apiv2/handlers/meta_service_group_test.gotests/server/apiv2/handlers/testutil.gotools/pd-ctl/pdctl/command/meta_service_group_command.go
| AssignedKeyspaces int `json:"assigned_keyspaces"` | ||
| ID string `json:"id"` | ||
| Addresses string `json:"addresses"` | ||
| Status *endpoint.MetaServiceGroupStatus `json:"status,omitempty"` |
There was a problem hiding this comment.
This changes the existing v2 response shape by removing assigned_keyspaces. Existing clients of GET/POST /meta-service-groups may break; please keep the old field while adding status, or version the API change.
There was a problem hiding this comment.
Good point. Restored the assigned_keyspaces field on the response in 10f7107 — it now mirrors status.assignment_count, so existing GET/POST /meta-service-groups clients keep working, while the new status object additionally exposes the enabled state. Happy to instead version the endpoint if you'd prefer a cleaner break.
Return 404 for an unknown meta-service group and 400 for a negative
assignment count instead of 500, by exporting sentinel errors and
mapping them in the HTTP handler. Rename the patch field AssignedCount
to AssignmentCount for consistency with the persisted status. Fix the
swagger annotations to describe the list/add/patch responses as
{array} MetaServiceGroupStatus. Add a pd-ctl integration test covering
list/add/set-enabled/set-assignment-count and the new error paths.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: tongjian <1045931706@qq.com>
…ount Restore the assigned_keyspaces field on the meta-service group HTTP response for backward compatibility with existing v2 clients; it mirrors status.assignment_count. Guard updateAssignmentTxn against decrementing the assignment count below zero, which would otherwise be preferred by findMinMetaGroup and skew load balancing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: tongjian <1045931706@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/apiv2/handlers/meta_service_group.go`:
- Around line 161-169: The status assembly in MetaServiceGroupStatus is
returning a partial object when statuses[id] is missing, which can hide the
enabled state while still emitting AssignedKeyspaces as 0. Update the GetStatus
response handling in meta_service_group.go so it does not append a group unless
a matching endpoint.MetaServiceGroupStatus exists, or else build a full default
status consistently before appending. Keep the behavior aligned with the
MetaServiceGroupStatus struct fields and the statuses[id] lookup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5a34b952-41f3-4062-ba82-a462f7dd119a
📒 Files selected for processing (2)
pkg/keyspace/meta_service_group.goserver/apiv2/handlers/meta_service_group.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/keyspace/meta_service_group.go
| var assignedKeyspaces int | ||
| if status := statuses[id]; status != nil { | ||
| assignedKeyspaces = status.AssignmentCount | ||
| } | ||
| response = append(response, MetaServiceGroupStatus{ | ||
| ID: id, | ||
| Addresses: addresses, | ||
| AssignedKeyspaces: assignmentCounts[id], | ||
| Status: statuses[id], | ||
| AssignedKeyspaces: assignedKeyspaces, |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Avoid returning partial status objects when a group status is missing.
Line 168 omits status because the field has omitempty, while Line 169 still returns assigned_keyspaces: 0. If GetStatus lacks an entry for any group, clients lose the new enabled state and may see a misleading assignment count. Prefer failing the response or synthesizing a default endpoint.MetaServiceGroupStatus consistently.
Suggested fix
for id, addresses := range groups {
- var assignedKeyspaces int
- if status := statuses[id]; status != nil {
- assignedKeyspaces = status.AssignmentCount
+ status := statuses[id]
+ if status == nil {
+ return nil, errors.Errorf("meta-service group %s status not found", id)
}
response = append(response, MetaServiceGroupStatus{
ID: id,
Addresses: addresses,
- Status: statuses[id],
- AssignedKeyspaces: assignedKeyspaces,
+ Status: status,
+ AssignedKeyspaces: status.AssignmentCount,
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var assignedKeyspaces int | |
| if status := statuses[id]; status != nil { | |
| assignedKeyspaces = status.AssignmentCount | |
| } | |
| response = append(response, MetaServiceGroupStatus{ | |
| ID: id, | |
| Addresses: addresses, | |
| AssignedKeyspaces: assignmentCounts[id], | |
| Status: statuses[id], | |
| AssignedKeyspaces: assignedKeyspaces, | |
| for id, addresses := range groups { | |
| status := statuses[id] | |
| if status == nil { | |
| return nil, errors.Errorf("meta-service group %s status not found", id) | |
| } | |
| response = append(response, MetaServiceGroupStatus{ | |
| ID: id, | |
| Addresses: addresses, | |
| Status: status, | |
| AssignedKeyspaces: status.AssignmentCount, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@server/apiv2/handlers/meta_service_group.go` around lines 161 - 169, The
status assembly in MetaServiceGroupStatus is returning a partial object when
statuses[id] is missing, which can hide the enabled state while still emitting
AssignedKeyspaces as 0. Update the GetStatus response handling in
meta_service_group.go so it does not append a group unless a matching
endpoint.MetaServiceGroupStatus exists, or else build a full default status
consistently before appending. Keep the behavior aligned with the
MetaServiceGroupStatus struct fields and the statuses[id] lookup.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10904 +/- ##
==========================================
+ Coverage 79.06% 79.18% +0.12%
==========================================
Files 540 541 +1
Lines 74535 75150 +615
==========================================
+ Hits 58929 59506 +577
- Misses 11423 11434 +11
- Partials 4183 4210 +27
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| c.AbortWithStatusJSON(http.StatusBadRequest, err.Error()) | ||
| return | ||
| } | ||
| c.AbortWithStatusJSON(http.StatusInternalServerError, err.Error()) |
There was a problem hiding this comment.
Please map errs.ErrEtcdTxnConflict to 409 here. Concurrent status patches or assignments can hit the etcd txn compare failure, and that is a retryable conflict rather than an internal server error.
| if manager.mgm != nil && oldMetaServiceGroup != newMetaServiceGroup { | ||
| if newMetaServiceGroup != "" && manager.mgm.GetGroups()[newMetaServiceGroup] == "" { | ||
| return errUnknownMetaServiceGroup | ||
| return ErrUnknownMetaServiceGroup |
There was a problem hiding this comment.
This only checks that the target group exists. Since disabled groups are skipped by automatic assignment, config updates should also reject moving a keyspace into a disabled meta-service group.
What problem does this PR solve?
Issue Number: Close #10889
CP from 5be7eae (original PR #392).
Original author: @AmoebaProtozoa.
What is changed and how does it work?
Check List
Tests
Code changes
Related changes
Validation:
GOFLAGS=-buildvcs=false go test ./pkg/keyspace ./pkg/storage ./server/apiv2/handlers -run 'TestMetaServiceGroup|TestNonExistent' -count=1 -timeout=5mGOFLAGS=-buildvcs=false go test ./tests/server/apiv2/handlers -run TestMetaServiceGroupTestSuite -count=1 -timeout=5mcd tools/pd-ctl && GOFLAGS=-buildvcs=false go test ./pdctl/command -run TestNonExistent -count=0 -timeout=5mgit diff --checkGOFLAGS=-buildvcs=false make checkRelease note
Summary by CodeRabbit
PATCH /meta-service-groups/:id/statusto update group status.pd-ctl meta-service-groupsubcommands:set-enabledandset-assignment-count.statusobject (enabled + assignment count); assignment selection respects enabled/assignment count.