Skip to content

Add DeviceClass lifecycle management to DRA reconciler#1285

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24471-deviceclass-lifecycle
Jun 16, 2026
Merged

Add DeviceClass lifecycle management to DRA reconciler#1285
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
TomerNewman:MGMT-24471-deviceclass-lifecycle

Conversation

@TomerNewman

Copy link
Copy Markdown
Collaborator

Add DeviceClass lifecycle management to DRA reconciler

Summary

Extends the DRAReconciler to manage cluster-scoped Kubernetes DeviceClass resources declared in spec.dra.deviceClasses. Since DeviceClasses are cluster-scoped, ownership is tracked via labels (ModuleNameLabel + ModuleNamespaceLabel) rather than ownerReferences. The reconciler performs declarative convergence — creating missing, patching drifted, and deleting stale DeviceClasses each cycle.

Changes

internal/constants/constants.go

  • Added ModuleNamespaceLabel constant for cluster-scoped ownership tracking

internal/controllers/dra_reconciler.go

  • Added getModuleDeviceClasses — lists DeviceClasses by ownership labels
  • Added handleDeviceClasses — orchestrates create/patch/delete convergence
  • Added createOrPatchDeviceClass — handles single DeviceClass create or patch
  • Added deleteDRAResources — unified deletion of DaemonSets and DeviceClasses with errors.Join aggregation and IsNotFound tolerance
  • Integrated DeviceClass handling into Reconcile loop (deletion, nil-DRA, and happy paths)

internal/controllers/dra_reconciler_test.go

  • Updated existing Reconcile tests for new DeviceClass expectations
  • Added DRAReconciler_getModuleDeviceClasses test block
  • Added DRAReconciler_handleDeviceClasses test block (create, patch, delete, errors, no-op)
  • Added DRAReconciler_deleteDRAResources test block (empty, success, partial failure, aggregated errors)

Testing

  • Unit tests: Comprehensive Ginkgo/Gomega tests covering all behavioral paths
  • Coverage: All new functions at 100%; Reconcile at 96.2%
  • Checks: make generate, make fmt, make vet, make unit-test all pass

Acceptance Criteria

  • DRAReconciler lists DeviceClasses with ModuleNameLabel + ModuleNamespaceLabel
  • Missing DeviceClasses are created with correct spec.selectors and spec.config
  • Existing DeviceClasses are patched when spec differs from desired state
  • DeviceClasses present in cluster but absent from desired list are deleted
  • Ownership uses labels only (no ownerReferences — cluster-scoped)
  • Module deletion or spec.dra removal triggers cleanup of all matching DeviceClasses
  • Partial delete failures return aggregated error (controller-runtime requeues)
  • Unit tests cover create, patch, delete, drift correction, and cleanup paths

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: TomerNewman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 11, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: TomerNewman / name: TomerNewman (4d8e7c2)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 11, 2026
@TomerNewman TomerNewman force-pushed the MGMT-24471-deviceclass-lifecycle branch from 876d626 to 4d8e7c2 Compare June 11, 2026 10:57
@TomerNewman TomerNewman marked this pull request as ready for review June 11, 2026 10:57
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 11, 2026
@codecov-commenter

codecov-commenter commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.01493% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.69%. Comparing base (fa23a9b) to head (675df52).
⚠️ Report is 382 commits behind head on main.

Files with missing lines Patch % Lines
internal/controllers/dra_reconciler.go 97.01% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1285      +/-   ##
==========================================
- Coverage   79.09%   73.69%   -5.40%     
==========================================
  Files          51       67      +16     
  Lines        5109     4977     -132     
==========================================
- Hits         4041     3668     -373     
- Misses        882     1140     +258     
+ Partials      186      169      -17     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit 876d626
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kmm/deploys/6a2a948adafef800087ae101
😎 Deploy Preview https://deploy-preview-1285--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit 4d8e7c2
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kmm/deploys/6a2a94b0aaf358000833da6f
😎 Deploy Preview https://deploy-preview-1285--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit 675df52
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kmm/deploys/6a31380c7f629000071ab27e
😎 Deploy Preview https://deploy-preview-1285--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/assign @yevgeny-shnaidman @ybettan

Comment thread internal/controllers/dra_reconciler.go Outdated
if mod.Spec.DRA == nil {
if len(existingDRADS) > 0 {
if err = r.reconHelperAPI.deleteDRADaemonSets(ctx, existingDRADS); err != nil {
if len(existingDRADS) > 0 || len(existingDCs) > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check if they exist: deleteDRAResource support receiving zero-length slices

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, removed the len() guard.

Comment thread internal/controllers/dra_reconciler.go Outdated
) error {
logger := log.FromContext(ctx)

if !exists {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use CreateOrPatch function here? That way you won't need to pass the exist and existingDC parameters, and is implified the code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to controllerutil.CreateOrPatch.

Comment thread internal/controllers/dra_reconciler.go Outdated

// Create missing or patch existing DeviceClasses to match the desired spec.
for _, desired := range mod.Spec.DRA.DeviceClasses {
existingDC, exists := existingByName[desired.Name]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the desired from existingDC, and then in 304 you can just go over DC that are left in the existingDC

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, now deleting from existingByName during the loop and iterating over what remains.

@TomerNewman TomerNewman force-pushed the MGMT-24471-deviceclass-lifecycle branch from 4d8e7c2 to 0230174 Compare June 15, 2026 17:02
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/retest

@yevgeny-shnaidman

Copy link
Copy Markdown
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2026
@TomerNewman

Copy link
Copy Markdown
Collaborator Author

/retest

@ybettan

ybettan commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

/hold
While I am reviewing

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 16, 2026
Comment thread internal/controllers/dra_reconciler.go Outdated
Comment on lines +202 to +205
if err := drh.client.Delete(ctx, &ds); err != nil && !apierrors.IsNotFound(err) {
errs = append(errs, fmt.Errorf("failed to delete DRA DaemonSet %s/%s: %v", ds.Namespace, ds.Name, err))
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use drh.client.DeleteAllOf(...) instead of looping?

Comment thread internal/controllers/dra_reconciler.go Outdated
Comment on lines +206 to +208
for _, dc := range deviceClasses {
if err := drh.client.Delete(ctx, &dc); err != nil && !apierrors.IsNotFound(err) {
errs = append(errs, fmt.Errorf("failed to delete DeviceClass %s: %v", dc.Name, err))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use drh.client.DeleteAllOf(...) instead of looping?

Comment on lines +313 to +314
if deleteErr := drh.client.Delete(ctx, &dc); deleteErr != nil && !apierrors.IsNotFound(deleteErr) {
errs = append(errs, fmt.Errorf("failed to delete DeviceClass %s: %v", name, deleteErr))

@ybettan ybettan Jun 16, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we construct a list of DC to be deleted and delete all of them in a single API call?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the full cleanup path (where DeleteAllOf works since all targets share the same labels), here we only want to delete the stale subset — not all module-owned DeviceClasses. DeleteAllOf with module labels would also remove the ones we just created/patched. Since there's no Kubernetes API to bulk-delete a specific set of named resources, individual Delete calls on the remaining map entries is the correct approach for selective deletion.

Extend the DRAReconciler to manage cluster-scoped DeviceClass resources
declared in spec.dra.deviceClasses. DeviceClasses use label-based ownership
(ModuleNameLabel + ModuleNamespaceLabel) since ownerReferences cannot
reference namespaced resources from cluster-scoped objects.

The reconciler performs declarative convergence on each cycle:
- Creates missing DeviceClasses with ownership labels
- Patches existing DeviceClasses to correct spec drift
- Deletes stale DeviceClasses no longer in the desired list
- Cleans up all DeviceClasses on Module deletion or DRA removal

Errors are aggregated via errors.Join to attempt all operations before
requeueing. NotFound errors on delete are tolerated to handle races.
@TomerNewman TomerNewman force-pushed the MGMT-24471-deviceclass-lifecycle branch from 0230174 to 675df52 Compare June 16, 2026 11:48
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2026
@ybettan

ybettan commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 16, 2026
@k8s-ci-robot k8s-ci-robot merged commit 6028ad1 into kubernetes-sigs:main Jun 16, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants