Enterprise RAG Chatbot - Deployment Guide

Enterprise RAG Chatbot - Deployment Guide

Requirements for Nutanix Enterprise AI + Intel® AI for Enterprise RAG Deployment

Below are initial deployment guidance to help you get started. What follows are some tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting point. These configurations can easily be scaled in Nutanix Enterprise AI and Intel® AI for Enterprise RAG to support customer environment needs.

These requirements support both Nutanix Enterprise AI and Intel® AI for Enterprise RAG.

On-prem Deployments

Resource Type	Specs
Compute	4x 32cores Intel Xeon 6 processors (generally 2x Dual Socket servers)
Memory	256GB per server (512GB Total)
Storage	512GB Total of disk space is generally recommended, though this is highly dependent on the model size and quantity

Cloud Deployment

Resource Type	Specs
Number of Instances	4 VM Instances
AWS EC2 Instance Type	4x m8i.16xlarge
GCP Compute Engine Instance Type	4x c4-standard-48-lssd
Azure VM Instance Type	4x Standard_D64s_v6
Remote File Storage (NFS equivalent)	512GB Total

Note

For VMs, a Virtual core may actually represent a hyperthread. We suggest using VM instances with 64 vCPUs each (or 48 vCPUs if 64 does not exist).

Nutanix Enterprise AI Endpoint Configuration

We have tested the following Nutanix Enterprise AI NAI Endpoint configuration to support Intel® AI for Enterprise RAG workloads on Intel® Xeon processors.

Resource Type	Specs	Comments
Nutanix Enterprise AI NAI Endpoint	2x 32vCPUs	Per Model Endpoint

Tested SLAs

Below are our initial tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting points. SLAs can vary based on model size, concurrency, and vector DB size requirements.

Metric Measured	Value
Time-to-First-Token (TTFT)	<3s
Time Per Output Token (TPOT)	<150ms
Concurrency	32 concurrent users
SLM/LLM Model Size	<15B
VectorDB Vectors	`100 Million`

Note: Users can introduce other model sizes, but that could impact compatibility and performance. Carefully evaluate your requirements and test thoroughly.

Note

In this case, vCPUs mean cores with HyperThreading enabled. In VM environments other than AWS, HyperThreading might be disabled. If HyperThreading is disabled, balloons needs to be also disabled in config.yaml

Architecture Diagram

Logical architecture diagram

Deployment Steps

Tools Installation
Deploy/Configure Nutanix Enterprise AI on EKS or on-premises
Deploy/Configure Intel® AI for Enterprise RAG on EKS or on-premises
Validate Demo by navigating to the Intel® AI for Enterprise RAG Web Application

1. Tools Installation

1.1 Tools Required for EKS

Install Terraform

Follow Terraform instructions here

Ubuntu Installation Example:

sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | \
gpg --dearmor | \

sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null

echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update

sudo apt-get install terraform

Install AWS CLI

Fresh Installation:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Update Existing Installation

curl "https://awscli.amazonaws.com/awscliv2.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli --update

1.2 Tools Required for all deployments

Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client

Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

2. Deploy Nutanix Enterprise AI

For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Nutanix documentation
For AWS EKS follow NAI EKS Deployment

3. Configure the Pipeline

Configure External LLM Endpoint

Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.

Edit deployment/pipelines/<your-pipeline>/reference-external-endpoint.yaml and configure the LLM settings:

config:
  endpoint: <endpoint path>
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://your-vllm-endpoint.com/api"
  LLM_MODEL_NAME: <model_name>
  LLM_VLLM_API_KEY: "your-api-key-here"

Replace the placeholder values with your actual LLM endpoint details.

For higher security in production environments, consider injecting Kubernetes Secrets instead of storing credentials in configuration files. Refer to Update the vLLM API Key Secret for secure credential management.

If your endpoint does not have properly configured TLS you can also add LLM_TLS_SKIP_VERIFY: "True"

Or, to update an existing Intel® AI for Enterprise RAG deployment, see Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint.

Inventory configuration

Change pipeline file in inventory's config.yaml to use file, that you have recently changed

pipelines:
  - namespace: chatqa
     samplePath: chatqa/reference-external-endpoint.yaml
     resourcesPath: chatqa/resources-reference-external-endpoint.yaml
     modelConfigPath: chatqa/resources-model-cpu.yaml
     type: chatqa

Additionally, if eRAG and NAI are on the same cluster, balloons needs to be configured:

balloons:
    ...
    vllm_custom_name: "kserve-container"

4. Deploy Intel® AI for Enterprise RAG

For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Intel® AI for Enterprise RAG deployment on Kubernetes.

Note

If application will be deployed on Nutanix Kubernetes Platform (NKP), it is recommended to disable telemetry. The instructions are provided in the link.

For AWS EKS follow EKS Deployment

Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint

If you have an existing Intel® AI for Enterprise RAG deployment and want to switch the vLLM endpoint to point to a Nutanix Enterprise AI endpoint, follow the steps below.

Warning

If you are using balloons policy with vllm_custom_name set, instalator will check for resources allocated for this model to ensure that it will fit. If you deploy model that uses more vCPUs than the one that Intel® AI for Enterprise RAG was deployed with, balloons might not work.

Prerequisites

An existing Intel® AI for Enterprise RAG deployment with the chatqa pipeline running
Access to a Nutanix Enterprise AI endpoint (URL, model name, and API key)
kubectl configured to access your Kubernetes cluster

Obtain Nutanix Enterprise AI Endpoint Details

Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.

Update the ChatQA GMConnector Configuration

Patch the chatqa GMConnector to use the new Nutanix Enterprise AI endpoint:

kubectl edit gmconnectors -n chatqa chatqa

Update the following variables.

Example:

env:
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api/v1/chat/completions"
  LLM_MODEL_NAME: "llama-3-3b"
  LLM_TLS_SKIP_VERIFY: "false"

Save and exit the editor to apply the changes.

Update the vLLM API Key Secret

Update the secret containing the vLLM API key with your new Nutanix Enterprise AI API key:

kubectl patch secret vllm-api-key-secret \
  -n chatqa \
  --type merge \
  -p '{"stringData":{"LLM_VLLM_API_KEY":"YOUR_NUTANIX_AI_API_KEY_HERE"}}'

Replace YOUR_NUTANIX_AI_API_KEY_HERE with the actual API key from your Nutanix Enterprise AI endpoint.

Verify the Configuration

Restart the LLM pods to pick up the new configuration:

kubectl rollout restart deployment llm-svc-deployment -n chatqa

Check the pod logs to verify the connection to the new endpoint:

kubectl logs -n chatqa deployment/llm-svc-deployment --tail=50

Look for log entries indicating successful connection to the Nutanix Enterprise AI endpoint.

[INFO] - [llms_microservice] - Connection with LLM model server validated successfully.

Test through the Chat UI to confirm For instructions on accessing the UI, see Access the UI/Grafana.

Obtaining Configuration from Nutanix AI LLM Endpoint

For complete Nutanix AI endpoint configuration instructions, refer to the Nutanix AI documentation.

High Level steps: Acquire the Nutanix AI vLLM endpoint details from your Nutanix AI deployment.

Navigate to your Nutanix AI management Console
Click on "Endpoints" to view the list of available LLM endpoints
Select the desired vLLM endpoint to view its details, including the URL, API key, and Sample Request code (see screenshot).

Note

Make sure that your url ends with /api

Example Nutanix AI vLLM endpoint configuration:

curl -k -X 'POST' 'https://nutanix-ai-endpoint.example.com/api/v1/chat/completions' \
 -H "Authorization: Bearer $API_KEY" \
 -H 'accept: application/json' \
 -H 'Content-Type: application/json' \
 -d '{
      "model": "llama-3-3b",
      "messages": [
        {
          "role": "user",
          "content": "Explain Deep Neural Networks in simple terms"
        }
      ],
      "max_tokens": 256,
      "stream": false
}'

Example configuration:

config:
  endpoint: /v1/chat/completions
  LLM_MODEL_SERVER: vllm
  LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api"
  LLM_MODEL_NAME: "llama-3-3b"
  LLM_VLLM_API_KEY: "your-api-key-here" 
  LLM_TLS_SKIP_VERIFY: "True"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enterprise RAG Chatbot - Deployment Guide

Table of Contents

Requirements for Nutanix Enterprise AI + Intel® AI for Enterprise RAG Deployment

On-prem Deployments

Cloud Deployment

Nutanix Enterprise AI Endpoint Configuration

Tested SLAs

Architecture Diagram

Logical architecture diagram

Deployment Steps

1. Tools Installation

1.1 Tools Required for EKS

Install Terraform

Install AWS CLI

1.2 Tools Required for all deployments

Install kubectl

Install Helm

2. Deploy Nutanix Enterprise AI

3. Configure the Pipeline

Configure External LLM Endpoint

Inventory configuration

4. Deploy Intel® AI for Enterprise RAG

Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint

Prerequisites

Obtain Nutanix Enterprise AI Endpoint Details

Update the ChatQA GMConnector Configuration

Update the vLLM API Key Secret

Verify the Configuration

Obtaining Configuration from Nutanix AI LLM Endpoint

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Enterprise RAG Chatbot - Deployment Guide

Table of Contents

Requirements for Nutanix Enterprise AI + Intel® AI for Enterprise RAG Deployment

On-prem Deployments

Cloud Deployment

Nutanix Enterprise AI Endpoint Configuration

Tested SLAs

Architecture Diagram

Logical architecture diagram

Deployment Steps

1. Tools Installation

1.1 Tools Required for EKS

Install Terraform

Install AWS CLI

1.2 Tools Required for all deployments

Install kubectl

Install Helm

2. Deploy Nutanix Enterprise AI

3. Configure the Pipeline

Configure External LLM Endpoint

Inventory configuration

4. Deploy Intel® AI for Enterprise RAG

Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint

Prerequisites

Obtain Nutanix Enterprise AI Endpoint Details

Update the ChatQA GMConnector Configuration

Update the vLLM API Key Secret

Verify the Configuration

Obtaining Configuration from Nutanix AI LLM Endpoint