Commit ce2ef9a
committed
Run toolkit validation in operand init containers
The toolkit-validation init containers in operand DaemonSets previously
polled for a toolkit-ready sentinel file on the host. During driver
reinstall cycles, the driver-manager restores node scheduling labels
before the driver container has loaded the nvidia kernel module. All
operand pods get created simultaneously and find a stale toolkit-ready
file from a previous cycle, passing the init gate while nvidia-smi would
actually fail.
Replace the shell-based sentinel file check with a nvidia-validator
check. This runs nvidia-smi through the toolkit runtime wrapper and
retries until it succeeds, validating both toolkit injection and driver
module readiness without depending on host sentinel files.1 parent 652724d commit ce2ef9a
File tree
8 files changed
+60
-14
lines changed- assets
- gpu-feature-discovery
- state-container-toolkit
- state-dcgm-exporter
- state-dcgm
- state-device-plugin
- state-driver
- state-mig-manager
- state-mps-control-daemon
8 files changed
+60
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | 44 | | |
38 | | - | |
| 45 | + | |
39 | 46 | | |
40 | 47 | | |
41 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
80 | 84 | | |
81 | 85 | | |
82 | 86 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
32 | 39 | | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
36 | | - | |
37 | | - | |
| 43 | + | |
| 44 | + | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
32 | 39 | | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | | - | |
| 44 | + | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
32 | 39 | | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | | - | |
| 44 | + | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
150 | 150 | | |
151 | 151 | | |
152 | 152 | | |
153 | | - | |
| 153 | + | |
154 | 154 | | |
155 | 155 | | |
156 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
32 | 39 | | |
33 | 40 | | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | | - | |
| 44 | + | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | 44 | | |
38 | 45 | | |
39 | | - | |
| 46 | + | |
40 | 47 | | |
41 | 48 | | |
42 | 49 | | |
| |||
0 commit comments