schedulingpb: carry region cpu stats in heartbeat #1455
Conversation
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
|
This PR now intentionally keeps the deprecated The cleanup should remove |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: okJiang The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
| // which is calculated by cpu_time_delta/heartbeat_reported_interval. | ||
| // Deprecated: kept temporarily for scheduling service compatibility. Remove it together | ||
| // with the legacy RegionInfo/API cpu_usage cleanup. | ||
| uint64 cpu_usage = 16 [deprecated = true]; |
There was a problem hiding this comment.
Yes, temporarily.
cpu_stats is the long-term field, but cpu_usage is still exposed through the shared RegionInfo / region API path today. The scheduling service reuses that path, so dropping it only from schedulingpb would make MCS differ from classic PD and require extra temporary API handling.
We should remove it later together with the repository-wide RegionInfo / API cpu_usage cleanup .
There was a problem hiding this comment.
It is consistent with cpu_stats.
cpu_usage ~= cpu_stats.unified_read + cpu_stats.scheduler
The values may not be exactly identical because of integer truncation and possible unclassified CPU time, but they are approximately the same source of truth.
There was a problem hiding this comment.
It is mainly for RegionInfo / API.
And the scheduling service converts the forwarded heartbeat into core.RegionInfo, and RegionInfo.GetCPUUsage() is still exposed through region APIs such as the cpu_usage field and /regions/cpu.
There was a problem hiding this comment.
Why region API will be sent to the scheduling service
There was a problem hiding this comment.
The scheduling service reuses PD's core.RegionInfo as its in-memory region model
In the MCS path, the scheduling service receives schedulingpb.RegionHeartbeatRequest, calls core.RegionFromHeartbeat(), and stores the result in its local region cache.
cpu_usage is still part of core.RegionInfo as the legacy CPU field. Other MCS interfaces that read from this RegionInfo will expose it as well, for example:
GET /scheduling/api/v1/regionsGET /scheduling/api/v1/regions/{id}
Both use the shared response.RegionInfo serialization, which still includes the cpu_usage JSON field.
So the reason to keep it here is to keep the scheduling service's RegionInfo consistent with classic PD while the legacy RegionInfo.cpuUsage / API field still exists.
The value is not a separate source from cpu_stats; it is the legacy aggregate value and is approximately equal to the CPU stats breakdown. We can remove it together with the later RegionInfo / API cleanup.
There was a problem hiding this comment.
I mean who needs to use GET /scheduling/api/v1/regions/{id} to get the cpu usage?
There was a problem hiding this comment.
I don't know of concrete consumers that use this api.It was only for preserving the shared RegionInfo response shape, but since cpu_usage is just the legacy aggregate and can be derived approximately from cpu_stats.
What problem does this PR solve?
ref tikv/pd#10608
When PD forwards region heartbeats to a standalone scheduling service, the scheduling heartbeat schema needs to carry the region CPU fields that monolithic PD already preserves.
cpu_statsis the long-term field, but the scheduling service still temporarily needs the deprecatedcpu_usagefield because PD's sharedRegionInfoand region HTTP API still expose that legacy value today.What is changed and how does it work?
cpu_usageinschedulingpb.RegionHeartbeatRequestwith an explicit temporary-compatibility commentcpu_statstoschedulingpb.RegionHeartbeatRequestpkg/schedulingpb/schedulingpb.pb.goscripts/proto.lockThe follow-up cleanup should remove
schedulingpb.RegionHeartbeatRequest.cpu_usagetogether with the repository-wideRegionInfoand APIcpu_usagecleanup tracked under tikv/pd#10608.Release note