Skip to content

Commit 508e321

Browse files
committed
feat: Install Nvidia DOCA on the servers post provisioning
Signed-off-by: Boris Glimcher <[email protected]>
1 parent 9f00d11 commit 508e321

File tree

31 files changed

+504
-6
lines changed

31 files changed

+504
-6
lines changed

docs/source/InstallationGuides/InstallingProvisionTool/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This playbook achieves the following tasks:
1313

1414
* Configures a docker registry to pull images from the internet and store them locally
1515

16-
* Optionally installs OFED and CUDA
16+
* Optionally installs OFED, DOCA and CUDA
1717

1818
.. toctree::
1919

docs/source/InstallationGuides/InstallingProvisionTool/installprovisiontool.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,18 @@ Optional configurations managed by the provision tool
2525
* CUDA requires an additional reboot while being installed. While this is taken care of by Omnia, users are required to wait an additional few minutes when running the provision tool with CUDA installation for the target nodes to come up.
2626

2727

28+
**Installing DOCA**
29+
30+
**Using the provision tool**
31+
32+
* If ``nvidia_doca_path`` is provided in ``input/provision_config.yml`` and Nvidia DPUs are available on the target nodes, DOCA packages will be deployed post provisioning without user intervention.
33+
34+
**Using the Network playbook**
35+
36+
* DOCA can also be installed using `network.yml <../../Roles/Network/index.html>`_ after provisioning the servers (Assuming the provision tool did not install DOCA packages).
37+
38+
.. note:: The DOCA package can be downloaded from `here <https://developer.nvidia.com/networking/doca>`_ .
39+
2840
**Installing OFED**
2941

3042
**Using the provision tool**

docs/source/InstallationGuides/InstallingProvisionTool/provisionparams.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,10 @@ Fill in all provision-specific parameters in ``input/provision_config.yml``
194194
| ``string`` | |
195195
| Optional | |
196196
+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
197+
| nvidia_doca_path | Absolute path to local copy of .rpm file containing DOCA packages. The doca rpm can be downloaded from https://developer.nvidia.com/networking/doca. DOCA will be installed post provisioning without any user intervention. Eg: nvidia_doca_path: "/root/doca-host-repo-rhel86-2.5.0-0.0.1.2.5.0108.1.el8.23.10.1.1.9.0.x86_64.rpm" |
198+
| ``string`` | |
199+
| Optional | |
200+
+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
197201

198202
.. note::
199203

docs/source/InstallationGuides/InstallingProvisionTool/provisionprereqs.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,12 +54,14 @@ Note the compatibility between cluster OS and control plane OS below:
5454

5555
.. [1] Ensure that control planes running RHEL have an active subscription or are configured to access local repositories. The following repositories should be enabled on the control plane: **AppStream**, **Code Ready Builder (CRB)**, **BaseOS**. For RHEL control planes running 8.5 and below, ensure that sshpass is additionally available to install or download to the control plane (from any local repository).
5656
57-
* To **optionally** set up CUDA and OFED using the provisioning tool, download the required repositories to the control plane from here to deploy on the target nodes:
57+
* To **optionally** set up CUDA, DOCA and OFED using the provisioning tool, download the required repositories to the control plane from here to deploy on the target nodes:
5858

5959
1. `For NVIDIA GPUs: <https://developer.nvidia.com/cuda-downloads/>`_: CUDA is a parallel computing platform and application programming interface that allows software to use certain types of graphics processing units for general purpose processing, an approach called general-purpose computing on GPUs.
6060

6161
2. `For Mellanox <https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/>`_: OFED (OpenFabrics Enterprise Distribution) is open-source software for RDMA and kernel bypass applications. OFED can be used in business, research and scientific environments that require highly efficient networks, storage connectivity and parallel computing.
6262

63+
3. `For NVIDIA DPUs: <https://developer.nvidia.com/networking/doca/>`_: DOCA is ...
64+
6365
* Ensure that all connection names under the network manager match their corresponding device names.
6466
To verify network connection names: ::
6567

docs/source/InstallationGuides/addinganewnode.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ While adding a new node to the cluster, users can modify the following:
88
- The operating system
99
- CUDA
1010
- OFED
11+
- DOCA
1112

1213
A new node can be added using the following ways:
1314

docs/source/InstallationGuides/reprovisioningthecluster.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ In the event that an existing Omnia cluster needs a different OS version or a fr
66
- The operating system
77
- CUDA
88
- OFED
9+
- DOCA
910

1011
Omnia can re-provision the cluster by running the following command: ::
1112

docs/source/Overview/SupportMatrix/omniainstalledsoftware.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ Software Installed by Omnia
126126
+------------------------------------+------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
127127
| MLNX-OFED | BSD License | MLNX_OFED is an NVIDIA tested and packaged version of OFED that supports two interconnect types using the same RDMA (remote DMA) and kernel bypass APIs called OFED verbs – InfiniBand and Ethernet. |
128128
+------------------------------------+------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
129+
| NVIDIA DOCA | NVIDIA License | The NVIDIA® DOCA® is the key to unlocking the potential of the NVIDIA® BlueField® networking platform to offload, accelerate, and isolate data center workloads. |
130+
+------------------------------------+------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
129131
| ansible pylibssh | LGPL 2.1 | Python bindings to client functionality of libssh specific to Ansible use case. |
130132
+------------------------------------+------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
131133
| perl-DBD-Pg | GNU General Public License v3 | DBD::Pg - PostgreSQL database driver for the DBI module |

docs/source/Roles/Network/index.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,9 @@ Some of the network features Omnia offers are:
1616

1717
2. Infiniband switch configuration
1818

19-
To install OFED drivers, enter all required parameters in ``input/network_config.yml``:
19+
3. Nvidia DOCA
20+
21+
To install OFED and DOCA drivers, enter all required parameters in ``input/network_config.yml``:
2022

2123

2224
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
@@ -37,6 +39,14 @@ To install OFED drivers, enter all required parameters in ``input/network_config
3739
| | * ``false`` <- Default |
3840
| | * ``true`` |
3941
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
42+
| nvidia_doca_offline_path | Absolute path to local copy of rpm file containing DOCA package. The package can be downloaded from https://developer.nvidia.com/networking/doca/. |
43+
| [optional] | |
44+
| ``string`` | |
45+
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
46+
| nvidia_doca_version | Indicates the version of DOCA to be downloaded. If ``nvidia_doca_offline_path`` is not given, declaring this variable is mandatory. |
47+
| [optional] | |
48+
| ``string`` | **Default value**: 2.5.0-0.0.1.23.10.1.1.9.0 |
49+
+------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4050

4151
To run the script: ::
4252

docs/source/Tables/bmc.csv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,11 @@ Optional",Absolute path to a local copy of the .iso file containing Mellanox OF
283283
``string``
284284

285285
Optional","Absolute path to local copy of .rpm file containing CUDA packages. The cuda rpm can be downloaded from https://developer.nvidia.com/cuda-downloads. CUDA will be installed post provisioning without any user intervention. Eg: cuda_toolkit_path: ""/root/cuda-repo-rhel8-12-0-local-12.0.0_525.60.13-1.x86_64.rpm"""
286+
"**nvidia_doca_path**
287+
288+
``string``
289+
290+
Optional","Absolute path to local copy of .rpm file containing DOCA packages. The doca rpm can be downloaded from https://developer.nvidia.com/networking/doca. DOCA will be installed post provisioning without any user intervention. Eg: nvidia_doca_path: ""/root/doca-host-repo-rhel86-2.5.0-0.0.1.2.5.0108.1.el8.23.10.1.1.9.0.x86_64.rpm"""
286291
"**apptainer_support**
287292

288293
``boolean`` [1]_

docs/source/Tables/mapping.csv

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,11 @@ Optional",Absolute path to a local copy of the .iso file containing Mellanox OF
258258
``string``
259259

260260
Optional","Absolute path to local copy of .rpm file containing CUDA packages. The cuda rpm can be downloaded from https://developer.nvidia.com/cuda-downloads. CUDA will be installed post provisioning without any user intervention. Eg: cuda_toolkit_path: ""/root/cuda-repo-rhel8-12-0-local-12.0.0_525.60.13-1.x86_64.rpm"""
261+
"**nvidia_doca_path**
262+
263+
``string``
264+
265+
Optional","Absolute path to local copy of .rpm file containing DOCA packages. The doca rpm can be downloaded from https://developer.nvidia.com/networking/doca. DOCA will be installed post provisioning without any user intervention. Eg: nvidia_doca_path: ""/root/doca-host-repo-rhel86-2.5.0-0.0.1.2.5.0108.1.el8.23.10.1.1.9.0.x86_64.rpm"""
261266
"**apptainer_support**
262267

263268
``boolean`` [1]_

0 commit comments

Comments
 (0)