Skip to content

Commit

Permalink
Merge pull request #37 from mikemckiernan/nvaie5
Browse files Browse the repository at this point in the history
Bash install for nvaie v5.0
  • Loading branch information
mikemckiernan authored Apr 15, 2024
2 parents ae78da3 + 0d9d7b3 commit 82df1ac
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 146 deletions.
193 changes: 47 additions & 146 deletions gpu-operator/install-gpu-operator-nvaie.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@
.. _nvaie-rn: https://docs.nvidia.com/ai-enterprise/latest/release-notes/index.html
.. |nvaie-rn| replace:: *NVIDIA AI Enterprise Release Notes*

.. |ellipses-img| image:: https://brand-assets.cne.ngc.nvidia.com/assets/icons/2.2.2/fill/common-more-horiz.svg
:width: 14px
:height: 14px
:alt: Actions button

.. Date: Aug 18 2021
.. Author: cdesiniotis
Expand All @@ -39,18 +44,15 @@ About NVIDIA AI Enterprise and Supported Platforms
**************************************************

NVIDIA AI Enterprise is an end-to-end, cloud-native suite of AI and data analytics software, optimized, certified, and supported by NVIDIA with NVIDIA-Certified Systems.
Additional information can be found at the `NVIDIA AI Enterprise <https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/>`_ web page.

NVIDIA AI Enterprise customers have access to a pre-configured GPU Operator within the NVIDIA Enterprise Catalog.
The GPU Operator is pre-configured to simplify the provisioning experience with NVIDIA AI Enterprise deployments.

The pre-configured GPU Operator differs from the GPU Operator in the public NGC catalog. The differences are:
Deploying the GPU Operator with NVIDIA AI Enterprise differs from the GPU Operator in the public NGC catalog.
The differences are:

* It is configured to use a prebuilt vGPU driver image (Only available to NVIDIA AI Enterprise customers)
* It is configured to use a prebuilt vGPU driver image that is only available to NVIDIA AI Enterprise customers.

* It is configured to use the `NVIDIA License System (NLS) <https://docs.nvidia.com/license-system/latest/>`_
* It is configured to use the `NVIDIA License System (NLS) <https://docs.nvidia.com/license-system/latest/>`_.

The following sections apply to the following configurations:
The GPU Operator with NVIDIA AI Enterprise is supported with the following platforms:

* Kubernetes on bare metal and on vSphere VMs with GPU passthrough and vGPU
* VMware vSphere with Tanzu
Expand All @@ -67,157 +69,50 @@ For Red Hat OpenShift, refer to :external+ocp:doc:`nvaie-with-ocp`.
Installing GPU Operator
***********************

To install GPU Operator with NVIDIA AI Enterprise, apply the following steps.
Beginning with the NVIDIA AI Enterprise release 5.0, the GPU Operator is installed using Bash script.

.. note::
To deploy an earlier version of NVIDIA AI Enterprise, refer to the documentation for the GPU Operator version specified in the NVIDIA AI Enterprise documentation
or an earlier version of the GPU Operator documentation, such as the
`23.9.1 <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/23.9.1/install-gpu-operator-nvaie.html>`__
version.

You can also use the following `script <https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/scripts/gpu-operator-nvaie.sh>`__, which automates the below installation instructions.
Create the ``gpu-operator`` namespace:
Prerequisites
=============

.. code-block:: console
- A client configuration token has been generated for the client on which the script will install the vGPU guest driver.
Refer to `Generating a Client Configuration Token <https://docs.nvidia.com/license-system/latest/nvidia-license-system-user-guide/index.html#generating-client-configuration-token>`__
in the *NVIDIA License System User Guide* for more information.
- An NGC CLI API key that is used to create an image pull secret.
The secret is used to pull the prebuilt vGPU driver image from NVIDIA NGC.
Refer to `Generating Your NGC API Key <https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html#generating-api-key>`__
in the *NVIDIA NGC Private Registry User Guide* for more information.

$ kubectl create namespace gpu-operator
Create an empty vGPU license configuration file:

.. code-block:: console
$ sudo touch gridd.conf
Generate and download a NLS client license token. Please refer to Section 4.6 of the `NLS User Guide <https://docs.nvidia.com/license-system/latest/pdf/nvidia-license-system-user-guide.pdf>`_ for instructions.

Rename the NLS client license token that you downloaded to ``client_configuration_token.tok``.

Create the ``licensing-config`` ConfigMap object in the ``gpu-operator`` namespace. Both the vGPU license
configuration file and the NLS client license token will be added to this ConfigMap:

.. code-block:: console
$ kubectl create configmap licensing-config \
-n gpu-operator --from-file=gridd.conf --from-file=<path>/client_configuration_token.tok
Procedure
=========

Create an image pull secret in the ``gpu-operator`` namespace for the private
registry that contains the containerized NVIDIA vGPU software graphics driver for Linux for
use with NVIDIA GPU Operator:
#. Export the NGC CLI API key and your email address as environment variables:

* Set the registry secret name:
.. code-block:: console
$ export NGC_API_KEY="M2Vub3QxYmgyZ..."
$ export NGC_USER_EMAIL="[email protected]"
.. code-block:: console
#. Go to the
`NVIDIA GPU Operator - Deploy Installer Script <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/vgpu/resources/gpu-operator-installer-5>`__
web page on NVIDIA NGC.

$ export REGISTRY_SECRET_NAME=ngc-secret
* Set the private registry name:

.. code-block:: console
$ export PRIVATE_REGISTRY=nvcr.io/nvaie
* Create an image pull secret in the ``gpu-operator`` namespace with the registry
secret name and the private registry name that you set. Replace ``password``,
and ``email-address`` with your NGC API key and email address respectively:

.. code-block:: console
$ kubectl create secret docker-registry ${REGISTRY_SECRET_NAME} \
--docker-server=${PRIVATE_REGISTRY} \
--docker-username='$oauthtoken' \
--docker-password='<password>' \
--docker-email='<email-address>' \
-n gpu-operator
Add the NVIDIA AI Enterprise Helm repository, where password is the NGC API key for accessing the NVIDIA Enterprise Collection that you generated:

.. code-block:: console
$ helm repo add nvaie https://helm.ngc.nvidia.com/nvaie \
--username='$oauthtoken' --password='<password>' \
&& helm repo update
Install the NVIDIA GPU Operator:

.. code-block:: console
Click the **File Browser** tab, identify your NVIDIA AI Enterprise release, click |ellipses-img|, and select **Download File**.

$ helm install --wait gpu-operator nvaie/gpu-operator-<M>-<m> -n gpu-operator
Copy the downloaded script to the same directory as the client configuration token.

Replace *M* and *m* with the major and minor release values, such as ``3-1``.
#. Rename the client configuration token that you downloaded to ``client_configuration_token.tok``.
Originally, the client configuration token is named to match the pattern: ``client_configuration_token_mm-dd-yyyy-hh-mm-ss.tok``.

To deploy the Helm chart with some customizations, refer to
:ref:`Chart Customization Options <gpu-operator-helm-chart-options>`.
#. From the directory that contains the downloaded script and the client configuration token, run the script:

.. code-block:: console
*********************************************************************
Installing GPU Operator with the NVIDIA Datacenter Driver
*********************************************************************

To install GPU Operator on baremetal with the NVIDIA Datacenter Driver, apply the following steps.

.. note::

You can also use the following `script <https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/scripts/install-gpu-operator-nvaie.sh>`__, which automates the below installation instructions.
Create the ``gpu-operator`` namespace:

.. code-block:: console
$ kubectl create namespace gpu-operator
Create an image pull secret in the ``gpu-operator`` namespace for the private
registry that contains the NVIDIA GPU Operator:

* Set the registry secret name:

.. code-block:: console
$ export REGISTRY_SECRET_NAME=ngc-secret
* Set the private registry name:

.. code-block:: console
$ export PRIVATE_REGISTRY=nvcr.io/nvaie
* Create an image pull secret in the ``gpu-operator`` namespace with the registry
secret name and the private registry name that you set. Replace ``password``,
and ``email-address`` with your NGC API key and email address respectively:

.. code-block:: console
$ kubectl create secret docker-registry ${REGISTRY_SECRET_NAME} \
--docker-server=${PRIVATE_REGISTRY} \
--docker-username='$oauthtoken' \
--docker-password='<password>' \
--docker-email='<email-address>' \
-n gpu-operator
Add the NVIDIA AI Enterprise Helm repository, where password is the NGC API key for accessing the NVIDIA Enterprise Collection that you generated:

.. code-block:: console
$ helm repo add nvaie https://helm.ngc.nvidia.com/nvaie \
--username='$oauthtoken' --password='<password>' \
&& helm repo update
Install the NVIDIA GPU Operator:

.. code-block:: console
$ helm install --wait gpu-operator nvaie/gpu-operator-<M>-<m> -n gpu-operator \
--set driver.repository=nvcr.io/nvidia \
--set driver.image=driver \
--set driver.version=<driver-version> \
--set driver.licensingConfig.configMapName=""
Replace *M* and *m* with the major and minor release values, such as ``3-1``.
Refer to the |nvaie-rn|_ for information about supported GPU Driver versions.

To deploy the Helm chart with some customizations, refer to
:ref:`Chart Customization Options <gpu-operator-helm-chart-options>`.
$ bash gpu-operator-nvaie.sh install
*********************************
Expand Down Expand Up @@ -269,3 +164,9 @@ with
Write and exit from the kubectl edit session (you can use :qw for instance if vi utility is used)

GPU Operator will redeploy sequentially all the driver pods with this new licensing information.

*******************
Related Information
*******************

- `NVIDIA AI Enterprise <https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/>`_ web page.
3 changes: 3 additions & 0 deletions gpu-operator/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ New Features
- NVIDIA Kubernetes Device Plugin version v1.14.5
- NVIDIA MIG Manager version v0.6.0

* Added support for NVIDIA AI Enterprise release 5.0.
Refer to :doc:`install-gpu-operator-nvaie` for information about installing the Operator with a Bash script.

.. _v23.9.2-fixed-issues:

Fixed issues
Expand Down

0 comments on commit 82df1ac

Please sign in to comment.