GPU Drivers & bootable containers (bootc)

I’ve been loving my new desktop setup for about ~four months now. I also think it’s pretty nifty that this system has only booted anaconda and containers! The integrated graphics were pretty impressive, but the simple, occasional CAD work I do quickly exposed the weakness in this area. I also want a setup that can offload some AI models to the GPU. I had heard that it’s painful to buy a GPU right now – that’s an understatement. As I write this, NVIDIA is in the middle of releasing the 50xx series and scalpers are working hard to buy them all. Anyway, I settled on a cheaper one from Amazon, until the stars align for me to get something like the 5090.

Th 4070 is pretty awesome for any graphics needs I have, but really isn’t enough memory to be useful for AI

If you’ve ever used Linux with a “team green” GPU, you’ve no doubt run into the fun world of drivers on Linux. We’ve known that using image mode & bootc would be a powerful tool for managing dependencies like this, but this is the first time I had the opportunity to get hands-on. This post will walk through some of the possible scenarios for handling GPU drivers on Fedora, CentOS Stream, & RHEL when building bootc images.

RPM repos

On the RHEL side of the house, NVIDIA does a great job providing precompiled drivers that follow kernel updates. This makes it easy to just dnf install the corresponding driver package that matches your kernel, and you’re good to go. …but even here there’s always a slight risk on an RPM system that your repos will have lag time for updates. To handle this NVIDIA provides DKMS versions that we’ll look at in a minute. For RHEL 9 users, I’d recommend following the example we publish for NVIDIA drivers here. Users can optionally leverage modularity via DNF to lock-in a particular driver version. For details on this path, refer to this blog post.

DKMS & akmod

Since it’s challenging for 3rd parties to align both building and shipping their drivers simultaneously with distro kernel updates, these two technologies were created to help. There are some differences between them, but essentially they both accomplish the same thing. Basically when a dkms or akmod package gets installed, it will compile the correct binary (typically a kmod) and install it. So every time a new kernel ship, a user will reboot, the service will detect that the needed driver is missing, and build it automatically. Users are only faced with a slightly longer boot. There are pros and cons with these models. In some regulated environments compilers are not allowed to run in production, and there are also some security limitations we face and this is also an anti-pattern for immutable environments. Our kernel teams also consider both of these technologies completely unsupportable, but ….honestly in the real world, they work great and users seems to prefer this over breaking their systems w/ updates.

NVIDIA includes DKMS packages and Fedora users will likely be familiar with the RPM-fusion’s akmod packages of the NVIDIA drivers. If you are using either of these in a Containerfile don’t expect them to work without adding an additional step to force the driver build during the container is build. We need to do this before “booting” as /usr/lib/modules/* is read-only on the OS. ….which is also a bad ass security feature! Your basic steps are just dnf -y install [packages] and then run one or both of these lines to trigger the build in our Containerfile:

RUN kver=$(cd /usr/lib/modules && echo *) && \
        dkms autoinstall -k $kver && \
        akmods --force --kernels $kver

Easy peasy!

UPDATE: After working with this for a few weeks it’s clear that some dkms/akmods scripts call uname -r to fetch the “booted” kernel. That totally makes since, but also means that your build host needs the same kernel as the bootc image or you need to “fake” the uname output. The Universal Blue team came up with a nice example for the latter option here.

I ended up adapting the ublue nvidia script to fake the uname output. Here’s what I have for dkms.sh:

#!/bin/bash
set -euox pipefail

kver=$(cd /usr/lib/modules && echo *)

cat >/tmp/fake-uname <<EOF
#!/usr/bin/env bash

if [ "\$1" == "-r" ] ; then
  echo ${kver}
  exit 0
fi

exec /usr/bin/uname \$@
EOF
install -Dm0755 /tmp/fake-uname /tmp/bin/uname

PATH=/tmp/bin:$PATH dkms autoinstall -k ${kver}
PATH=/tmp/bin:$PATH akmods --force --kernels ${kver}

Then I call it via my Containerfile like so:

#Build kmods
COPY --chmod=755 dkms.sh /tmp
RUN /tmp/dkms.sh

Multi-stage builds

Containers make it simple to do multi-stage builds. For this we can build the modules in a throw-away container and grab the binaries in the final image. This creates a much cleaner final system w/o the additional overhead of things like gcc-c++ and kernel-devel packages. In our matrix room this really nice example was shared. If you follow the Containerfile and shell script, you can see how just the .ko files are pulled into the final example. This is my preferred way to handle things like this. I imagine that Bluefin is doing something similar but I didn’t have time to check. Of course this pattern works great with *any* driver you need to build from source. Since we’re no longer assembling packages “live on a system” and worried about repo timing & lag, this is super clean and easy to catch failures in the build pipeline rather than at runtime – changing things at runtime feels very 90’s to me now. :)

My system

I experimented with a number of things. If you only need the NVIDIA drivers, both the rpmfusion & NVIDIA ones “just work”. Where things got trickier for me is installing CUDA. I really want untils like nvidia-smi on my system and a lot of things are just easier with cuda & the cuda-toolkit installed. While rpmfusion provides great docs to use their drivers with NVIDIA’s cuda packages, something in kwin_wayland really did not like that on my system. All 24 cords max at 100% and the system in useless. Since Cuda is pretty large both in terms of disk space and rpm dependencies, the multi-stage builds don’t seem like a great option. I believe most of the savings would be cancelled out and packages pulled in just because of rpm deps. Here’s the quick and dirty relevant sections of the containerfile I’m using while I write this. I will likely refactor this at some point, but here it is for now:

RUN dnf -y install gcc-c++ nvidia-driver && \
        dkms autoinstall -k $(rpm -qa kernel --queryformat '%{VERSION}-%{RELEASE}.%{ARCH}') && \
        dnf install -y nvidia-container-toolkit && \
        dnf install -y cuda cuda-toolkit && \
        dnf install -y https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm && \
        dnf group install -y kde-desktop virtualization && \
        dnf install -y android-tools bash-completion bcache-tools bwm-ng cockpit cockpit-podman cockpit-storaged cockpit-ws cockpit-podman cockpit-machines cockpit-selinux cups cups-browsed dmraid ethtool firefox firewalld fuse-exfat fwupd gamemode gdb git guvcview gvfs HandBrake HandBrake-gui htop input-leap kamera k3b kernel-headers dnf install -y libva-nvidia-driver libva-utils libvirt libvirt-daemon lm_sensors nfs-utils nss-mdns nvidia-vaapi-driver pcp pcp-selinux powertop qemu-kvm samba steam-devices subscription-manager sysstat thermald tree tuned vdpauinfo vim-enhanced virt-install virt-manager vulkan-tools v4l2loopback v4l-utils xdpyinfo wget && \
        akmods --force --kernels "$(rpm -q --queryformat '%{VERSION}-%{RELEASE}.%{ARCH}' kernel-devel)" && \
        dnf remove -y plasma-discover-offline-updates plasma-discover-packagekit plasma-pk-updates tracker tracker-miners plasma-x11 plasma-workspace-x11 && \
        dnf clean all

My containers with cuda & cuda-toolkit almost doubled in size; they are about 8G compressed on my registry and ~17g on disk. Honestly, it works perfectly and being able to rollback a couple times really helped my sanity while iterating on the Containerfile. The performance is great, and I really like this card compared to the integrated graphics. Please leave a comment if you seen things I can improve or that I missed. …..also, if you know how to buy a 5090 at non-scalper prices definitely tell me!

4 Replies to “GPU Drivers & bootable containers (bootc)”

Pingback: Recent technical articles & videos. - CertDepot
Colin Walters says:

March 3, 2025 at 11:36 am

> QUALIFIED_KERNEL=”$(rpm -qa | grep -P ‘kernel-(|'”$KERNEL_SUFFIX”‘-)(\d+\.\d+\.\d+)’ | sed -E ‘s/kernel-(|'”$KERNEL_SUFFIX”‘-)//’)”

My recommendation is what’s in: https://docs.fedoraproject.org/en-US/bootc/initramfs/#_regenerating_the_initrd i.e. just:
`kver=$(cd /usr/lib/modules && echo *)`
mrguitar says:

March 4, 2025 at 9:21 pm

Thanks for this! I cleaned up the dkms script and it’s much cleaner now. I still don’t love faking the uname output in this way, but it does work well.
Pingback: Adventures with bootc: Upgrading to Fedora 42 – mrguitardotnet

This site uses Akismet to reduce spam. Learn how your comment data is processed.

RPM repos

DKMS & akmod

Multi-stage builds

My system

4 Replies to “GPU Drivers & bootable containers (bootc)”

Leave a Reply