@@ -44,9 +44,50 @@ With v1.8, the GPU Operator provides an option to load the ``nvidia-peermem`` ke
4444 nvidia/gpu-operator \
4545 --set driver.rdma.enabled=true
4646
47+
48+
49+ Verification
50+ ==============
51+
4752During the installation, an `initContainer ` is used with the driver daemonset to wait on the Mellanox OFED (MOFED) drivers to be ready.
48- This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
49-
53+ This initContainer checks for Mellanox NICs on the node and ensures that the necessary kernel symbols are exported MOFED kernel drivers.
54+ Once everything is in place, the container nvidia-peermem-ctr will be instantiated inside the driver daemonset.
55+
56+ .. code-block :: console
57+
58+ $ kubectl describe pod -n gpu-operator-resources nvidia-driver-daemonset-xxxx
59+ <snip>
60+ Init Containers:
61+ mofed-validation:
62+ Container ID: containerd://5a36c66b43f676df616e25ba7ae0c81aeaa517308f28ec44e474b2f699218de3
63+ Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.1
64+ Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator@sha256:7a70e95fd19c3425cd4394f4b47bbf2119a70bd22d67d72e485b4d730853262c
65+
66+ <snip>
67+ Containers:
68+ nvidia-driver-ctr:
69+ Container ID: containerd://199a760946c55c3d7254fa0ebe6a6557dd231179057d4909e26c0e6aec49ab0f
70+ Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
71+ Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
72+
73+ <snip>
74+ nvidia-peermem-ctr:
75+ Container ID: containerd://0742d86f6017bf0c304b549ebd8caad58084a4185a1225b2c9a7f5c4a171054d
76+ Image: nvcr.io/nvaie/vgpu-guest-driver:470.63.01-ubuntu20.04
77+ Image ID: nvcr.io/nvaie/vgpu-guest-driver@sha256:a1b7d2c8e1bad9bb72d257ddfc5cec341e790901e7574ba2c32acaddaaa94625
78+
79+ <snip>
80+
81+
82+ To validate that nvidia-peermem-ctr has successfully loaded the nvidia-peermem module, you can use the following command:
83+
84+ .. code-block :: console
85+
86+ $ kubectl logs -n gpu-operator-resourcesnvidia-driver-daemonset-xxx -c nvidia-peermem-ctr
87+ waiting for mellanox ofed and nvidia drivers to be installed
88+ waiting for mellanox ofed and nvidia drivers to be installed
89+ successfully loaded nvidia-peermem module
90+
5091
5192 For more information on ``nvidia-peermem ``, refer to the `documentation <https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#nvidia-peermem >`_.
5293
0 commit comments