Skip to content

docs(core-concepts): add GPU virtualization principles page#167

Merged
hami-robot[bot] merged 2 commits intoProject-HAMi:masterfrom
rootsongjc:concept
Apr 24, 2026
Merged

docs(core-concepts): add GPU virtualization principles page#167
hami-robot[bot] merged 2 commits intoProject-HAMi:masterfrom
rootsongjc:concept

Conversation

@rootsongjc
Copy link
Copy Markdown
Contributor

@rootsongjc rootsongjc commented Apr 17, 2026

Add a comprehensive GPU virtualization documentation page covering:

  • Kubernetes GPU scheduling fundamentals (Device Plugin, DRA)
  • HAMi virtual GPU scheduling architecture and workflow
  • Device injection and CUDA API interception details
  • Scheduling strategy explanation (binpack/spread)

The page is added to both English and Chinese docs under Core Concepts, with PlantUML sequence diagrams exported as SVG images for each language.

This document has been rewritten based on the content of @togettoyou https://github.com/togettoyou/kubernetes-src-notes/blob/main/content/addons/hami.md

Signed-off-by: Jimmy Song [email protected]

Add a comprehensive GPU virtualization documentation page covering:
- Kubernetes GPU scheduling fundamentals (Device Plugin, DRA)
- HAMi virtual GPU scheduling architecture and workflow
- Device injection and CUDA API interception details
- Scheduling strategy explanation (binpack/spread)

The page is added to both English and Chinese docs under Core Concepts, with PlantUML sequence diagrams exported as SVG images for each language.

Signed-off-by: Jimmy Song <[email protected]>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 17, 2026

Deploy Preview for project-hami ready!

Name Link
🔨 Latest commit 4b488e0
🔍 Latest deploy log https://app.netlify.com/projects/project-hami/deploys/69e9d627cefa4e00084ea1bc
😎 Deploy Preview https://deploy-preview-167--project-hami.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@hami-robot hami-robot Bot requested review from archlitchi and windsonsea April 17, 2026 11:21
@hami-robot hami-robot Bot added the size/XXL label Apr 17, 2026
Copy link
Copy Markdown
Member

@windsonsea windsonsea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rootsongjc, windsonsea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot hami-robot Bot added the approved label Apr 20, 2026
@mesutoezdil
Copy link
Copy Markdown
Contributor

mesutoezdil commented Apr 20, 2026

hey @rootsongjc, i found an issue. the docs reference ./architecture.md in two places but that file isn't included in this PR:

  • gpu-virtualization.md line 248
  • introduction.md line 43

looks like you have the diagrams (hami-architecture.plantuml, svg files) but the actual markdown doc is missing. users will get 404s when they click those links.

should i create that file or do you want to handle it?

@mesutoezdil
Copy link
Copy Markdown
Contributor

mesutoezdil commented Apr 20, 2026

hey!

Docs migration from HAMi repo:
checked what needs to be migrated from https://github.com/Project-HAMi/HAMi/tree/master/docs

most stuff is already on website, just converted to docusaurus format:
-benchmark, config, dashboard, offline-install, release notes
-all device docs (nvidia, cambricon, hygon, etc)
-missing: scheduler-event-log.md (troubleshooting guide), how-to-profiling-scheduler.md (dev debugging), general-technical-review.md

recommendations:
-verify scheduler-event-log and profiling docs aren't already somewhere on site
-if they're really missing, add them under /docs/troubleshooting/ and /docs/developers/
-use docusaurus frontmatter format like the rest of the docs
-update sidebars.js to include them

@rootsongjc
Copy link
Copy Markdown
Contributor Author

hey @rootsongjc, i found an issue. the docs reference ./architecture.md in two places but that file isn't included in this PR:

  • gpu-virtualization.md line 248

  • introduction.md line 43

looks like you have the diagrams (hami-architecture.plantuml, svg files) but the actual markdown doc is missing. users will get 404s when they click those links.

should i create that file or do you want to handle it?

I will handle this

@rootsongjc
Copy link
Copy Markdown
Contributor Author

hey @rootsongjc, i found an issue. the docs reference ./architecture.md in two places but that file isn't included in this PR:

  • gpu-virtualization.md line 248
  • introduction.md line 43

looks like you have the diagrams (hami-architecture.plantuml, svg files) but the actual markdown doc is missing. users will get 404s when they click those links.

should i create that file or do you want to handle it?

I checked the docs do exist, and this PR only render on the next version. Once this version is OK , I will add the them to the current version (v2.8.0).

Add a new v2.8 core concepts document explaining GPU virtualization in Kubernetes and how HAMi implements software-layer GPU sharing.

The guide covers Device Plugin workflow and limitations, introduces DRA context, and describes why HAMi uses CUDA API interception plus node annotations to provide VRAM-aware isolation without driver or app changes. This improves user understanding of HAMi’s architecture and the motivation behind its scheduling design.docs(v2.8): add GPU virtualization principles guide

Signed-off-by: Jimmy Song <[email protected]>
@rootsongjc rootsongjc requested a review from windsonsea April 23, 2026 11:30
@windsonsea
Copy link
Copy Markdown
Member

/lgtm

@hami-robot hami-robot Bot added the lgtm label Apr 24, 2026
@hami-robot hami-robot Bot merged commit ce319ce into Project-HAMi:master Apr 24, 2026
11 checks passed
@rootsongjc rootsongjc deleted the concept branch April 24, 2026 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants