-
Notifications
You must be signed in to change notification settings - Fork 247
Description
Environment:
OS: Ubuntu 24.04 LTS (Noble Numbat)
NVIDIA Driver: 560.28.03
Installation Method: Using NVIDIA's generic .deb repository (https://nvidia.github.io/libnvidia-container/stable/deb/amd64)
nvidia-container-toolkit Version: 1.17.6-1
libnvidia-container-tools Version: 1.17.6-1
Docker Version: [Please fill this in - run: docker --version]
Containerd Version: [Please fill this in - run: containerd --version]
Problem Description:
After a fresh installation of the NVIDIA Container Toolkit on Ubuntu 24.04 using the generic stable/deb repository, Docker is unable to access GPUs using the --gpus flag. The specific error is:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
IGNORE_WHEN_COPYING_START
Use code with caution.
IGNORE_WHEN_COPYING_END
Troubleshooting revealed the following:
The host NVIDIA drivers are working correctly (nvidia-smi runs fine).
The nvidia-container-toolkit package installs successfully.
Running sudo nvidia-ctk runtime configure --runtime=docker correctly updates /etc/docker/daemon.json.
Running sudo nvidia-ctk runtime configure --runtime=containerd correctly updates /etc/containerd/config.toml.
Both containerd and docker services restart successfully after configuration.
However, docker info | grep -i runtime output shows that the nvidia runtime is not registered:
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
IGNORE_WHEN_COPYING_START
Use code with caution.
IGNORE_WHEN_COPYING_END
Further investigation showed that the required OCI prestart hook directory and file are missing after installing/reinstalling nvidia-container-toolkit and its dependencies:
$ ls -l /usr/share/oci/hooks/prestart/
ls: cannot access '/usr/share/oci/hooks/prestart/': No such file or directory
IGNORE_WHEN_COPYING_START
Use code with caution. Bash
IGNORE_WHEN_COPYING_END
Inspecting the contents of the libnvidia-container-tools package (version 1.17.6-1 from the stable/deb repo) confirms the hook file is missing from the package itself:
$ apt-get download libnvidia-container-tools=1.17.6-1
$ dpkg -c libnvidia-container-tools_1.17.6-1_amd64.deb | grep 'oci/hooks/prestart/nvidia-container-runtime-hook'
This command produces NO output
IGNORE_WHEN_COPYING_START
Use code [with caution](https://support.google.com/legal/answer/13505487). Bash
IGNORE_WHEN_COPYING_END
Switching to the experimental/deb repository and checking the available libnvidia-container-tools package (1.17.0~rc.2-1) also revealed the hook file is missing from that package version as well.
Expected Behavior:
Installing nvidia-container-toolkit (and its dependency libnvidia-container-tools) from the stable/deb repository should:
Create the /usr/share/oci/hooks/prestart/ directory.
Place the nvidia-container-runtime-hook executable file in that directory.
Allow containerd and docker to correctly register the nvidia runtime.
Enable docker run --gpus all ... to function correctly.
Actual Behavior:
The OCI hook file is missing from the libnvidia-container-tools package in the stable/deb (and experimental/deb) repository. This prevents the nvidia runtime from being registered, causing Docker GPU passthrough to fail with the could not select device driver "" error. This effectively blocks GPU usage in Docker when using the recommended generic repository on a fresh Ubuntu 24.04 installation.