I’ve been really interested in the potential behind the unified cgroup hierarchy, aka cgroup v2, in the kernel for a while now. I even helped out with a talk on this subject earlier this year. It’s worth listening to Tejun’s part of the talk if you’re not familiar with the value behind v2. While a lot of user space, for example systemd, has really solid support for v2, there have been historic gaps around virt and containers. On the virt side, initial v2 support went into libvirt 4.9 or 5.0, and it’s continuing to be improved. For containers, we’re tracking the OCI progress here and here, and Giuseppe Scrivano has done some great v2 enablement with an alternative runtime he wrote called crun. crun is basically runc re-written in C, and while there are pros and cons on the language side, it’s ridiculously fast at instantiating containers compared to runc.
With the current version of Fedora we have all the pieces we need to put this together, in fact there’s still a change request for Fedora 31 to default to using the unified hierarchy. For the past year or so I’ve been running Fedora Silverblue mainly because I’m in love with the transactional update model we use on the CoreOS side of the house, but there are other advantages too. The main one for me is that it enforces strong separation of installed applications from the underlying host operating system. This has the side benefit of encouraging, or enforcing depending on your viewpoint, the use of containers way more often for tasks that would traditionally be done on the host OS. …and there’s huge benefit here too, because the underlying system is always in a pristine shape with no cruft left behind. This helps combat system bloat and ensures that future upgrades will be seamless. Trust me when I say the benefits are addicting, and you will feel them when upgrading releases. It only costs a reboot – amazing.
Anyway, to get started with this, I just adapted Giuseppe’s post for Silverblue. Start by adding the command to the bootloader to enable cgroups v2:
$ sudo rpm-ostree kargs --append systemd.unified_cgroup_hierarchy=1
Next, install crun & reboot
$ sudo rpm-ostree install crun $ sudo systemctl reboot
Configure podman to use crun by default:
$ sudo cp /usr/share/containers/libpod.conf /etc/containers/ $ sudo vi /etc/containers/libpod.conf #edit/add the following # Default OCI runtime runtime = "crun" #under [runtimes] add: crun = [ "/usr/bin/crun", "/usr/local/bin/crun" ]
At this point, running podman as sudo will *just work*. It can easily be tested using Giuseppe’s example:
$ sudo podman run --memory=100M --rm -ti fedora bash [root@46e94a7237e8 /]# cat /proc/self/cgroup 0::/machine.slice/46e94a7237e8a39b9f7fa038f12456b1a01381b6676fd942c2a889ba2b4ed630-20920.scope [root@46e94a7237e8 /]# cat /sys/fs/cgroup/machine.slice/46e94a7237e8a39b9f7fa038f12456b1a01381b6676fd942c2a889ba2b4ed630-20920.scope/memory.max 104857600
Certainly not the most exciting demo, but you get the idea and it works great for all the daily things I need. What’s missing here is the rootless mode which isn’t working out of the box right now. I haven’t checked to see if all of the patches listed in the original post are packaged yet or if it’s user error. …it wouldn’t surprise me if it’s user error.
Regarding libvirt, which I have installed on Silverblue, I was unable to find any docs around configuring v2 and it will error out with the default config. Simply disabling the controllers is my dirty workaround until I figure this out:
/etc/libvirt/qemu.conf #cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuset", "cpuacct" ] cgroup_controllers = [ ]
Anyway, this is super exciting for me now that I can use this on my daily driver and take advantage of some of the newer systemd features.
Shortly after writing this I got rootless working pretty easily. I had a typo when I created this drop-in, which was simple to fix:
$ cat /etc/systemd/system/user@.service.d/controllers.conf [Service] Delegate=cpu cpuacct io blkio memory devices pids
Also previous versions of podman had left some cruft in my home dir. Simply removing ~/.config/containers/* fixed the problem. Now, I see this from inside the container:
# id uid=0(root) gid=0(root) groups=0(root)
and outside looks like this:
$ ps -aux |grep cat bbreard 4301 94.8 0.0 5392 1756 pts/0 R+ 13:15 0:07 cat /dev/zero
According to https://bugzilla.redhat.com/show_bug.cgi?id=1727149 libvirt 5.5 is required for this to work which will be present in F31. Users looking to try this earlier can grab the packages here: https://fedoraproject.org/wiki/Virtualization_Preview_Repository