@@ -44,9 +44,50 @@ With v1.8, the GPU Operator provides an option to load the ``nvidia-peermem`` ke
44
44
nvidia/gpu-operator \
45
45
--set driver.rdma.enabled=true
46
46
47
+
48
+
49
+ Verification
50
+ ==============
51
+
47
52
During the installation, an `initContainer ` is used with the driver daemonset to wait on the Mellanox OFED (MOFED) drivers to be ready.
48
- This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
49
-
53
+ This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
54
+ Once everything is in place, the container nvidia-peermem-ctr will be instantiated inside the driver daemonset.
55
+
56
+ .. code-block :: console
57
+
58
+ $ kubectl describe pod -n gpu-operator-resources nvidia-driver-daemonset-xxxx
59
+ <snip>
60
+ Init Containers:
61
+ mofed-validation:
62
+ Container ID: containerd://5a36c66b43f676df616e25ba7ae0c81aeaa517308f28ec44e474b2f699218de3
63
+ Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.1
64
+ Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator@sha256:7a70e95fd19c3425cd4394f4b47bbf2119a70bd22d67d72e485b4d730853262c
65
+
66
+ <snip>
67
+ Containers:
68
+ nvidia-driver-ctr:
69
+ Container ID: containerd://199a760946c55c3d7254fa0ebe6a6557dd231179057d4909e26c0e6aec49ab0f
70
+ Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
71
+ Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
72
+
73
+ <snip>
74
+ nvidia-peermem-ctr:
75
+ Container ID: containerd://0742d86f6017bf0c304b549ebd8caad58084a4185a1225b2c9a7f5c4a171054d
76
+ Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
77
+ Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
78
+
79
+ <snip>
80
+
81
+
82
+ To validate that nvidia-peermem-ctr has successfully loaded the nvidia-peermem module, you can use the following command:
83
+
84
+ .. code-block :: console
85
+
86
+ $ kubectl logs -n gpu-operator-resourcesnvidia-driver-daemonset-xxx -c nvidia-peermem-ctr
87
+ waiting for mellanox ofed and nvidia drivers to be installed
88
+ waiting for mellanox ofed and nvidia drivers to be installed
89
+ successfully loaded nvidia-peermem module
90
+
50
91
51
92
For more information on ``nvidia-peermem ``, refer to the `documentation <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem >`_.
52
93
0 commit comments