feat(containerd): plumb cgroup_writable setting into config templates#890
feat(containerd): plumb cgroup_writable setting into config templates#890bilby91 wants to merge 2 commits into
Conversation
Add the cgroup_writable option to containerd 2.1 and 2.2 k8s config templates. When users set container-runtime.cgroup-writable=true, the setting is rendered under each runtime handler section, enabling writable cgroups for unprivileged containers on cgroup v2 systems. Relates to: bottlerocket-os/bottlerocket#4666 Companion PR: bottlerocket-os/bottlerocket-settings-sdk#128 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
| {{#if settings.container-runtime.cgroup-writable}} | ||
| cgroup_writable = {{settings.container-runtime.cgroup-writable}} | ||
| {{/if}} |
There was a problem hiding this comment.
At a minimum, this needs to be scoped to nodes where cgroup v2 is used. That's the default on Bottlerocket but it's still possible to switch back to v1. For that, I'd recommend adding a guard helper to schnauzer.
I'm also skeptical that it makes sense to enable this system-wide; if a container isn't prepared to lock down the delegated hierarchy, it could be exposed to additional risks.
Better would be a per-pod annotation so that individual pods can opt-in.
There was a problem hiding this comment.
@bcressey Thanks for the review!
For concern 1, I agree — I'll add an is_cgroup_v2 guard helper in schnauzer (following the existing pattern for system-detection helpers like fips_enabled) so this only takes effect on nodes running cgroup v2.
Regarding concerns 2 and 3, I've been looking into this but I'm still getting familiar with the full architecture. From what I can tell, a per-pod approach could work via Kubernetes RuntimeClasses — containerd already supports per-runtime cgroup_writable values and maps them from the pod's runtimeClassName through GetSandboxRuntime(). So we could define a second runtime (e.g. runc-cgroup-writable) in the containerd config template with the flag enabled, keeping it off by default and letting pods opt in. This wouldn't require upstream changes.
I also noticed that an NRI plugin could potentially achieve this by intercepting CreateContainer events and replacing the cgroup mount options based on pod annotations, though Bottlerocket doesn't ship any NRI plugins today so that would be a bigger lift.
I'm not deeply familiar with how Bottlerocket typically handles this kind of per-pod configuration, so I'd really appreciate your guidance on which direction makes sense here — or if there's another approach I'm not seeing.
There was a problem hiding this comment.
@bcressey Any path we can explore to continue this conversation ?
Thanks!
…tting Add a schnauzer template helper that detects cgroup v2 by checking for /sys/fs/cgroup/cgroup.controllers, and wrap cgroup_writable in all containerd config templates so it is only emitted on cgroup v2 nodes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary
cgroup_writableto containerd 2.1 and 2.2 k8s config templates (standard and NVIDIA)runtimes.runc,runtimes.nvidia, etc.) matching containerd's config structureRelates to: bottlerocket-os/bottlerocket#4666
Companion PR: bottlerocket-os/bottlerocket-settings-sdk#128
See also: containerd/containerd#11131
🤖 Generated with Claude Code