Skip to content

schedulingpb: carry region cpu stats in heartbeat #1455

Open
lhy1024 wants to merge 5 commits into
pingcap:masterfrom
lhy1024:sched-heartbeat-cpu
Open

schedulingpb: carry region cpu stats in heartbeat #1455
lhy1024 wants to merge 5 commits into
pingcap:masterfrom
lhy1024:sched-heartbeat-cpu

Conversation

@lhy1024

@lhy1024 lhy1024 commented Apr 21, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

ref tikv/pd#10608

When PD forwards region heartbeats to a standalone scheduling service, the scheduling heartbeat schema needs to carry the region CPU fields that monolithic PD already preserves.

cpu_stats is the long-term field, but the scheduling service still temporarily needs the deprecated cpu_usage field because PD's shared RegionInfo and region HTTP API still expose that legacy value today.

What is changed and how does it work?

  • keep deprecated cpu_usage in schedulingpb.RegionHeartbeatRequest with an explicit temporary-compatibility comment
  • add cpu_stats to schedulingpb.RegionHeartbeatRequest
  • regenerate pkg/schedulingpb/schedulingpb.pb.go
  • update scripts/proto.lock

The follow-up cleanup should remove schedulingpb.RegionHeartbeatRequest.cpu_usage together with the repository-wide RegionInfo and API cpu_usage cleanup tracked under tikv/pd#10608.

Release note

None.

lhy1024 added 2 commits April 22, 2026 14:32
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024 lhy1024 changed the title schedulingpb: carry region cpu stats in heartbeat schedulingpb: keep region cpu usage for compatibility Apr 22, 2026
@lhy1024

lhy1024 commented Apr 22, 2026

Copy link
Copy Markdown
Member Author

This PR now intentionally keeps the deprecated cpu_usage field instead of removing it right away. Although cpu_stats is the long-term field, the standalone scheduling service still temporarily needs cpu_usage because PD's shared RegionInfo and region HTTP API still expose that legacy value today.

The cleanup should remove schedulingpb.RegionHeartbeatRequest.cpu_usage together with the repository-wide RegionInfo and API cpu_usage cleanup tracked in tikv/pd#10608.

@lhy1024 lhy1024 changed the title schedulingpb: keep region cpu usage for compatibility schedulingpb: carry region cpu stats in heartbeat Apr 22, 2026
@ti-chi-bot

ti-chi-bot Bot commented May 7, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: okJiang
Once this PR has been reviewed and has the lgtm label, please assign rleungx for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot

ti-chi-bot Bot commented May 7, 2026

Copy link
Copy Markdown

[LGTM Timeline notifier]

Timeline:

  • 2026-05-07 09:49:21.454788844 +0000 UTC m=+347634.328138816: ☑️ agreed by okJiang.

Comment thread proto/schedulingpb.proto
// which is calculated by cpu_time_delta/heartbeat_reported_interval.
// Deprecated: kept temporarily for scheduling service compatibility. Remove it together
// with the legacy RegionInfo/API cpu_usage cleanup.
uint64 cpu_usage = 16 [deprecated = true];

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, temporarily.

cpu_stats is the long-term field, but cpu_usage is still exposed through the shared RegionInfo / region API path today. The scheduling service reuses that path, so dropping it only from schedulingpb would make MCS differ from classic PD and require extra temporary API handling.

We should remove it later together with the repository-wide RegionInfo / API cpu_usage cleanup .

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this usage accurate now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is consistent with cpu_stats.

cpu_usage ~= cpu_stats.unified_read + cpu_stats.scheduler

The values may not be exactly identical because of integer truncation and possible unclassified CPU time, but they are approximately the same source of truth.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we use it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly for RegionInfo / API.

And the scheduling service converts the forwarded heartbeat into core.RegionInfo, and RegionInfo.GetCPUUsage() is still exposed through region APIs such as the cpu_usage field and /regions/cpu.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why region API will be sent to the scheduling service

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduling service reuses PD's core.RegionInfo as its in-memory region model

In the MCS path, the scheduling service receives schedulingpb.RegionHeartbeatRequest, calls core.RegionFromHeartbeat(), and stores the result in its local region cache.

cpu_usage is still part of core.RegionInfo as the legacy CPU field. Other MCS interfaces that read from this RegionInfo will expose it as well, for example:

  • GET /scheduling/api/v1/regions
  • GET /scheduling/api/v1/regions/{id}

Both use the shared response.RegionInfo serialization, which still includes the cpu_usage JSON field.

So the reason to keep it here is to keep the scheduling service's RegionInfo consistent with classic PD while the legacy RegionInfo.cpuUsage / API field still exists.

The value is not a separate source from cpu_stats; it is the legacy aggregate value and is approximately equal to the CPU stats breakdown. We can remove it together with the later RegionInfo / API cleanup.

image image

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean who needs to use GET /scheduling/api/v1/regions/{id} to get the cpu usage?

@lhy1024 lhy1024 May 14, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of concrete consumers that use this api.It was only for preserving the shared RegionInfo response shape, but since cpu_usage is just the legacy aggregate and can be derived approximately from cpu_stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants