Skip to content

Commit 872dc27

Browse files
committed
Merge branch 'gpudirect-rdma-francis' into 'master'
Gpudirect rdma francis See merge request nvidia/cloud-native/cnt-docs!22
2 parents 0ddb756 + 5110bdd commit 872dc27

File tree

1 file changed

+43
-2
lines changed

1 file changed

+43
-2
lines changed

gpu-operator/gpu-operator-rdma.rst

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,50 @@ With v1.8, the GPU Operator provides an option to load the ``nvidia-peermem`` ke
4444
nvidia/gpu-operator \
4545
--set driver.rdma.enabled=true
4646
47+
48+
49+
Verification
50+
==============
51+
4752
During the installation, an `initContainer` is used with the driver daemonset to wait on the Mellanox OFED (MOFED) drivers to be ready.
48-
This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
49-
53+
This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
54+
Once everything is in place, the container nvidia-peermem-ctr will be instantiated inside the driver daemonset.
55+
56+
.. code-block:: console
57+
58+
$ kubectl describe pod -n gpu-operator-resources nvidia-driver-daemonset-xxxx
59+
<snip>
60+
Init Containers:
61+
mofed-validation:
62+
Container ID: containerd://5a36c66b43f676df616e25ba7ae0c81aeaa517308f28ec44e474b2f699218de3
63+
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.1
64+
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator@sha256:7a70e95fd19c3425cd4394f4b47bbf2119a70bd22d67d72e485b4d730853262c
65+
66+
<snip>
67+
Containers:
68+
nvidia-driver-ctr:
69+
Container ID: containerd://199a760946c55c3d7254fa0ebe6a6557dd231179057d4909e26c0e6aec49ab0f
70+
Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
71+
Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
72+
73+
<snip>
74+
nvidia-peermem-ctr:
75+
Container ID: containerd://0742d86f6017bf0c304b549ebd8caad58084a4185a1225b2c9a7f5c4a171054d
76+
Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
77+
Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
78+
79+
<snip>
80+
81+
82+
To validate that nvidia-peermem-ctr has successfully loaded the nvidia-peermem module, you can use the following command:
83+
84+
.. code-block:: console
85+
86+
$ kubectl logs -n gpu-operator-resourcesnvidia-driver-daemonset-xxx -c nvidia-peermem-ctr
87+
waiting for mellanox ofed and nvidia drivers to be installed
88+
waiting for mellanox ofed and nvidia drivers to be installed
89+
successfully loaded nvidia-peermem module
90+
5091
5192
For more information on ``nvidia-peermem``, refer to the `documentation <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem>`_.
5293

0 commit comments

Comments
 (0)