Skip to content

Latest commit

 

History

History
339 lines (250 loc) · 13.4 KB

File metadata and controls

339 lines (250 loc) · 13.4 KB

Enterprise RAG Chatbot - Deployment Guide

Nutanix Intel Logo

Table of Contents

Requirements for Nutanix Enterprise AI + Intel® AI for Enterprise RAG Deployment

Below are initial deployment guidance to help you get started. What follows are some tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting point. These configurations can easily be scaled in Nutanix Enterprise AI and Intel® AI for Enterprise RAG to support customer environment needs.

These requirements support both Nutanix Enterprise AI and Intel® AI for Enterprise RAG.

On-prem Deployments

Resource Type Specs
Compute 4x 32cores Intel Xeon 6 processors (generally 2x Dual Socket servers)
Memory 256GB per server (512GB Total)
Storage 512GB Total of disk space is generally recommended, though this is highly dependent on the model size and quantity

Cloud Deployment

Resource Type Specs
Number of Instances 4 VM Instances
AWS EC2 Instance Type 4x m8i.16xlarge
GCP Compute Engine Instance Type 4x c4-standard-48-lssd
Azure VM Instance Type 4x Standard_D64s_v6
Remote File Storage (NFS equivalent) 512GB Total

Note

For VMs, a Virtual core may actually represent a hyperthread. We suggest using VM instances with 64 vCPUs each (or 48 vCPUs if 64 does not exist).

Nutanix Enterprise AI Endpoint Configuration

We have tested the following Nutanix Enterprise AI NAI Endpoint configuration to support Intel® AI for Enterprise RAG workloads on Intel® Xeon processors.

Resource Type Specs Comments
Nutanix Enterprise AI NAI Endpoint 2x 32vCPUs Per Model Endpoint

Tested SLAs

Below are our initial tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting points. SLAs can vary based on model size, concurrency, and vector DB size requirements.

Metric Measured Value
Time-to-First-Token (TTFT) <3s
Time Per Output Token (TPOT) <150ms
Concurrency 32 concurrent users
SLM/LLM Model Size <15B
VectorDB Vectors 100 Million

Note: Users can introduce other model sizes, but that could impact compatibility and performance. Carefully evaluate your requirements and test thoroughly.

Note

In this case, vCPUs mean cores with HyperThreading enabled. In VM environments other than AWS, HyperThreading might be disabled. If HyperThreading is disabled, balloons needs to be also disabled in config.yaml

Architecture Diagram

Logical architecture diagram

Logical architecture diagram

Deployment Steps

  1. Tools Installation
  2. Deploy/Configure Nutanix Enterprise AI on EKS or on-premises
  3. Deploy/Configure Intel® AI for Enterprise RAG on EKS or on-premises
  4. Validate Demo by navigating to the Intel® AI for Enterprise RAG Web Application

1. Tools Installation

1.1 Tools Required for EKS

Install Terraform

Follow Terraform instructions here

Ubuntu Installation Example:

sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | \
gpg --dearmor | \

sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null

echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update

sudo apt-get install terraform

Install AWS CLI

Fresh Installation:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Update Existing Installation

curl "https://awscli.amazonaws.com/awscliv2.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli --update

1.2 Tools Required for all deployments

Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client

Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

2. Deploy Nutanix Enterprise AI

  1. For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Nutanix documentation

  2. For AWS EKS follow NAI EKS Deployment

3. Configure the Pipeline

Configure External LLM Endpoint

Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.

Edit deployment/pipelines/<your-pipeline>/reference-external-endpoint.yaml and configure the LLM settings:

config:
  endpoint: <endpoint path>
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://your-vllm-endpoint.com/api"
  LLM_MODEL_NAME: <model_name>
  LLM_VLLM_API_KEY: "your-api-key-here"

Replace the placeholder values with your actual LLM endpoint details.

For higher security in production environments, consider injecting Kubernetes Secrets instead of storing credentials in configuration files. Refer to Update the vLLM API Key Secret for secure credential management.

If your endpoint does not have properly configured TLS you can also add LLM_TLS_SKIP_VERIFY: "True"

Or, to update an existing Intel® AI for Enterprise RAG deployment, see Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint.

Inventory configuration

Change pipeline file in inventory's config.yaml to use file, that you have recently changed

pipelines:
  - namespace: chatqa
     samplePath: chatqa/reference-external-endpoint.yaml
     resourcesPath: chatqa/resources-reference-external-endpoint.yaml
     modelConfigPath: chatqa/resources-model-cpu.yaml
     type: chatqa

Additionally, if eRAG and NAI are on the same cluster, balloons needs to be configured:

balloons:
    ...
    vllm_custom_name: "kserve-container" 

4. Deploy Intel® AI for Enterprise RAG

  1. For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Intel® AI for Enterprise RAG deployment on Kubernetes.

Note

If application will be deployed on Nutanix Kubernetes Platform (NKP), it is recommended to disable telemetry. The instructions are provided in the link.

  1. For AWS EKS follow EKS Deployment

Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint

If you have an existing Intel® AI for Enterprise RAG deployment and want to switch the vLLM endpoint to point to a Nutanix Enterprise AI endpoint, follow the steps below.

Warning

If you are using balloons policy with vllm_custom_name set, instalator will check for resources allocated for this model to ensure that it will fit. If you deploy model that uses more vCPUs than the one that Intel® AI for Enterprise RAG was deployed with, balloons might not work.

Prerequisites

  • An existing Intel® AI for Enterprise RAG deployment with the chatqa pipeline running
  • Access to a Nutanix Enterprise AI endpoint (URL, model name, and API key)
  • kubectl configured to access your Kubernetes cluster

Obtain Nutanix Enterprise AI Endpoint Details

Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.

Update the ChatQA GMConnector Configuration

Patch the chatqa GMConnector to use the new Nutanix Enterprise AI endpoint:

kubectl edit gmconnectors -n chatqa chatqa

Update the following variables.

Example:

env:
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api/v1/chat/completions"
  LLM_MODEL_NAME: "llama-3-3b"
  LLM_TLS_SKIP_VERIFY: "false"  

Save and exit the editor to apply the changes.

Update the vLLM API Key Secret

Update the secret containing the vLLM API key with your new Nutanix Enterprise AI API key:

kubectl patch secret vllm-api-key-secret \
  -n chatqa \
  --type merge \
  -p '{"stringData":{"LLM_VLLM_API_KEY":"YOUR_NUTANIX_AI_API_KEY_HERE"}}'

Replace YOUR_NUTANIX_AI_API_KEY_HERE with the actual API key from your Nutanix Enterprise AI endpoint.

Verify the Configuration

  1. Restart the LLM pods to pick up the new configuration:
kubectl rollout restart deployment llm-svc-deployment -n chatqa
  1. Check the pod logs to verify the connection to the new endpoint:
kubectl logs -n chatqa deployment/llm-svc-deployment --tail=50

Look for log entries indicating successful connection to the Nutanix Enterprise AI endpoint.

[INFO] - [llms_microservice] - Connection with LLM model server validated successfully.
  1. Test through the Chat UI to confirm For instructions on accessing the UI, see Access the UI/Grafana.

Obtaining Configuration from Nutanix AI LLM Endpoint

For complete Nutanix AI endpoint configuration instructions, refer to the Nutanix AI documentation.

High Level steps: Acquire the Nutanix AI vLLM endpoint details from your Nutanix AI deployment.

  1. Navigate to your Nutanix AI management Console
  2. Click on "Endpoints" to view the list of available LLM endpoints
  3. Select the desired vLLM endpoint to view its details, including the URL, API key, and Sample Request code (see screenshot).

Nutanix AI Endpoint

Note

Make sure that your url ends with /api

Example Nutanix AI vLLM endpoint configuration:

curl -k -X 'POST' 'https://nutanix-ai-endpoint.example.com/api/v1/chat/completions' \
 -H "Authorization: Bearer $API_KEY" \
 -H 'accept: application/json' \
 -H 'Content-Type: application/json' \
 -d '{
      "model": "llama-3-3b",
      "messages": [
        {
          "role": "user",
          "content": "Explain Deep Neural Networks in simple terms"
        }
      ],
      "max_tokens": 256,
      "stream": false
}'

Example configuration:

config:
  endpoint: /v1/chat/completions
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api"
  LLM_MODEL_NAME: "llama-3-3b"
  LLM_VLLM_API_KEY: "your-api-key-here" 
  LLM_TLS_SKIP_VERIFY: "True"