Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver persistence changes #810

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/nvidia/v1alpha1/nvidiadriver_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,12 @@ type NVIDIADriverSpec struct {
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.x-descriptors="urn:alm:descriptor:com.tectonic.ui:booleanSwitch"
UseOpenKernelModules *bool `json:"useOpenKernelModules,omitempty"`

// PersistDriver indicates if the driver install should be persisted across restarts
PersistDriver *bool `json:"persist,omitempty"`

// InstallDirectory is the install location for the driver
InstallDirectory string `json:"installDirectory,omitempty"`

// NVIDIA Driver container startup probe settings
StartupProbe *ContainerProbeSpec `json:"startupProbe,omitempty"`

Expand Down
5 changes: 5 additions & 0 deletions api/nvidia/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions config/crd/bases/nvidia.com_nvidiadrivers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,9 @@ spec:
items:
type: string
type: array
installDirectory:
description: InstallDirectory is the install location for the driver
type: string
kernelModuleConfig:
description: 'Optional: Kernel module configuration parameters for
the NVIDIA Driver'
Expand Down Expand Up @@ -511,6 +514,10 @@ spec:
description: NodeSelector specifies a selector for installation of
NVIDIA driver
type: object
persist:
description: PersistDriver indicates if the driver install should
be persisted across restarts
type: boolean
priorityClassName:
description: 'Optional: Set priorityClassName'
type: string
Expand Down
2 changes: 2 additions & 0 deletions config/samples/nvidia_v1alpha1_nvidiadriver.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ spec:
useHostMofed: false
gds:
enabled: false
persist: false
installDirectory: '/opt/nvidia/driver'
# Private mirror repository configuration
repoConfig:
name: ""
Expand Down
7 changes: 7 additions & 0 deletions deployments/gpu-operator/crds/nvidia.com_nvidiadrivers.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,9 @@ spec:
items:
type: string
type: array
installDirectory:
description: InstallDirectory is the install location for the driver
type: string
kernelModuleConfig:
description: 'Optional: Kernel module configuration parameters for
the NVIDIA Driver'
Expand Down Expand Up @@ -511,6 +514,10 @@ spec:
description: NodeSelector specifies a selector for installation of
NVIDIA driver
type: object
persist:
description: PersistDriver indicates if the driver install should
be persisted across restarts
type: boolean
priorityClassName:
description: 'Optional: Set priorityClassName'
type: string
Expand Down
51 changes: 51 additions & 0 deletions manifest.diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
diff --git a/manifests/state-driver/0500_daemonset.yaml b/manifests/state-driver/0500_daemonset.yaml
index 8ceb7820c..8716d147d 100644
--- a/manifests/state-driver/0500_daemonset.yaml
+++ b/manifests/state-driver/0500_daemonset.yaml
@@ -205,6 +205,12 @@ spec:
# always use runc for driver containers
- name: NVIDIA_VISIBLE_DEVICES
value: void
+ {{- if .Driver.Spec.PersistDriver }}
+ - name: PERSIST_DRIVER
+ value: "true"
+ - name: INSTALL_DIR
+ value: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
+ {{- end }}
{{- if deref .Driver.Spec.UseOpenKernelModules }}
- name: OPEN_KERNEL_MODULES_ENABLED
value: "true"
@@ -254,6 +260,14 @@ spec:
{{- end }}
{{- end }}
volumeMounts:
+ {{- if .Driver.Spec.PersistDriver }}
+ - name: install-dir
+ mountPath: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
+ - name: lib-modules
+ mountPath: /lib/modules
+ - name: dev
+ mountPath: /dev
+ {{- end }}
- name: run-nvidia
mountPath: /run/nvidia
mountPropagation: Bidirectional
@@ -574,6 +588,18 @@ spec:
readOnly: true
{{- end }}
volumes:
+ {{- if .Driver.Spec.PersistDriver }}
+ - name: install-dir
+ hostPath:
+ path: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
+ type: DirectoryOrCreate
+ - name: lib-modules
+ hostPath:
+ path: /lib/modules
+ - name: dev
+ hostPath:
+ path: /dev
+ {{- end }}
- name: run-nvidia
hostPath:
path: /run/nvidia
26 changes: 26 additions & 0 deletions manifests/state-driver/0500_daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,12 @@ spec:
# always use runc for driver containers
- name: NVIDIA_VISIBLE_DEVICES
value: void
{{- if .Driver.Spec.PersistDriver }}
- name: PERSIST_DRIVER
value: "true"
- name: INSTALL_DIR
value: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
{{- end }}
{{- if deref .Driver.Spec.UseOpenKernelModules }}
- name: OPEN_KERNEL_MODULES_ENABLED
value: "true"
Expand Down Expand Up @@ -254,6 +260,14 @@ spec:
{{- end }}
{{- end }}
volumeMounts:
{{- if .Driver.Spec.PersistDriver }}
- name: install-dir
mountPath: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
- name: lib-modules
mountPath: /lib/modules
- name: dev
mountPath: /dev
{{- end }}
- name: run-nvidia
mountPath: /run/nvidia
mountPropagation: Bidirectional
Expand Down Expand Up @@ -574,6 +588,18 @@ spec:
readOnly: true
{{- end }}
volumes:
{{- if .Driver.Spec.PersistDriver }}
- name: install-dir
hostPath:
path: {{ .Driver.Spec.InstallDirectory | default "/opt/nvidia/driver" }}
type: DirectoryOrCreate
- name: lib-modules
hostPath:
path: /lib/modules
- name: dev
hostPath:
path: /dev
{{- end }}
- name: run-nvidia
hostPath:
path: /run/nvidia
Expand Down