-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch current container runtime config #686
Conversation
1a5292f
to
56b7927
Compare
310ee8a
to
adf9026
Compare
tools/container/toolkit/toolkit.go
Outdated
toolkitRuntimeList := getNvidiaContainerRuntimeList(cfg, opts.ContainerRuntimeRuntimes.Value()) | ||
if len(toolkitRuntimeList) > 0 { | ||
configValues["nvidia-container-runtime.runtimes"] = toolkitRuntimeList | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should no longer be required.
We will ONLY set the binaries from the runtimes. Note that if opts.ContainerRuntimeRuntimes
is set we will overwrite the value we set here as part of processing the "options" config options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will ONLY set the binaries from the runtimes.
I don't think we can do that. If you look at the default containerd config toml, the BinaryName
fields are empty, meaning that the PATH-resolvable low-level-runtime binary paths will be used. So we still need the default list ["docker-runc", "runc", "crun"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we implement it so that if the slice is empty we don't update the config. Alternatively see the comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tariq1890.
I think this has come together very nicely. I have some additional comments.
Thanks for the work on this @tariq1890. One question that I had was whether we should update how we apply the modified config. At present we're going to generate a file that includes the full config. This may not be desirable when we are using drop-in files in crio, for example. I don't think it's a blocker for this iteration, but it would be good to put some thought into how the system will behave under these conditions. |
a0ddda5
to
6266ef3
Compare
Agreed, one way is to modify the command line config source to only return the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patience @tariq1890.
Let's get this in!
Signed-off-by: Tariq Ibrahim <[email protected]> add default runtime binary path to runtimes field of toolkit config toml Signed-off-by: Tariq Ibrahim <[email protected]> [no-relnote] Get low-level runtimes consistently We ensure that we use the same low-level runtimes regardless of the runtime engine being configured. This ensures consistent behaviour. Signed-off-by: Evan Lezar <[email protected]> Co-authored-by: Evan Lezar <[email protected]> address review comment Signed-off-by: Tariq Ibrahim <[email protected]>
3e6c2b7
to
f477dc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
CNT-5327 Documents NVIDIA/nvidia-container-toolkit#686 Signed-off-by: Mike McKiernan <[email protected]>
* Update Kubernetes supported versions Documents CNT-5342 * Support for Net Op 24.7.0 Documents CNT-5346 * MIG profiles for GH200 144GB HBM3e Documents NVIDIA/gpu-operator#1057. * Troubleshooting taints and tolerations * Tolerations for clean up and upgrade CRD jobs Documents these PRs: - NVIDIA/gpu-operator#967 - NVIDIA/gpu-operator#960 * RBAC improvements CNT-5289 Documents: - NVIDIA/gpu-operator#890 - NVIDIA/gpu-operator#986 * Fixed issue with CRI-O CNT-5327 Documents NVIDIA/nvidia-container-toolkit#686 * Automatic upgrade of CRDs CNT-5328 Documents NVIDIA/gpu-operator#1024 * Driver pod name and IP env vars CNT-5329 4567038 Documents NVIDIA/gpu-operator#1026 * Precompiled driver containers are GA Signed-off-by: Mike McKiernan <[email protected]> * KubeVirt and OSV vGPU v17.4 * GH200 NVL2 144GB HBM3e name * Add 565 driver * NFD version and driver container envvars Signed-off-by: Mike McKiernan <[email protected]>
Summary of changes made in this PR:
In CRI-O and Containerd config engines, retrieve the container runtime config commands via the following commands:
i.
crio status config
ii.
containerd config dump
When configuring CRI-O runtime, we prioritise the runtime designated as the
default_runtime
in the config when setting up thenvidia
runtime handler as opposed to favouring therunc
runtime handler at all times. This is needed as vanilla cri-o packages havecrun
as the default low-level-runtime. Live-swapping the low-level runtime causes the running containers in a cluster to breakAdd the full path of the low-level runtime to the
nvidia-container-runtime.runtimes
config value, so thatnvidia-container-runtime
binds to a low-level runtime binary that cannot be resolved in thePATH
.