- Enterprise RAG Chatbot - Deployment Guide
- Requirements for Nutanix Enterprise AI + Intel® AI for Enterprise RAG Deployment
- Nutanix Enterprise AI Endpoint Configuration
- Tested SLAs
- Architecture Diagram
- Deployment Steps
- 1. Tools Installation
- 2. Deploy Nutanix Enterprise AI
- 3. Configure the Pipeline
- 4. Deploy Intel® AI for Enterprise RAG
- Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint
- Obtaining Configuration from Nutanix AI LLM Endpoint
Below are initial deployment guidance to help you get started. What follows are some tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting point. These configurations can easily be scaled in Nutanix Enterprise AI and Intel® AI for Enterprise RAG to support customer environment needs.
These requirements support both Nutanix Enterprise AI and Intel® AI for Enterprise RAG.
| Resource Type | Specs |
|---|---|
| Compute | 4x 32cores Intel Xeon 6 processors (generally 2x Dual Socket servers) |
| Memory | 256GB per server (512GB Total) |
| Storage | 512GB Total of disk space is generally recommended, though this is highly dependent on the model size and quantity |
| Resource Type | Specs |
|---|---|
| Number of Instances | 4 VM Instances |
| AWS EC2 Instance Type | 4x m8i.16xlarge |
| GCP Compute Engine Instance Type | 4x c4-standard-48-lssd |
| Azure VM Instance Type | 4x Standard_D64s_v6 |
| Remote File Storage (NFS equivalent) | 512GB Total |
Note
For VMs, a Virtual core may actually represent a hyperthread. We suggest using VM instances with 64 vCPUs each (or 48 vCPUs if 64 does not exist).
We have tested the following Nutanix Enterprise AI NAI Endpoint configuration to support Intel® AI for Enterprise RAG workloads on Intel® Xeon processors.
| Resource Type | Specs | Comments |
|---|---|---|
| Nutanix Enterprise AI NAI Endpoint | 2x 32vCPUs | Per Model Endpoint |
Below are our initial tested SLAs based on the provided system requirements for on-prem or cloud deployments. Note that these are provided as a starting points. SLAs can vary based on model size, concurrency, and vector DB size requirements.
| Metric Measured | Value |
|---|---|
| Time-to-First-Token (TTFT) | <3s |
| Time Per Output Token (TPOT) | <150ms |
| Concurrency | 32 concurrent users |
| SLM/LLM Model Size | <15B |
| VectorDB Vectors | 100 Million |
Note: Users can introduce other model sizes, but that could impact compatibility and performance. Carefully evaluate your requirements and test thoroughly.
Note
In this case, vCPUs mean cores with HyperThreading enabled. In VM environments other than AWS, HyperThreading might be disabled.
If HyperThreading is disabled, balloons needs to be also disabled in config.yaml
- Tools Installation
- Deploy/Configure Nutanix Enterprise AI on EKS or on-premises
- Deploy/Configure Intel® AI for Enterprise RAG on EKS or on-premises
- Validate Demo by navigating to the Intel® AI for Enterprise RAG Web Application
Follow Terraform instructions here
Ubuntu Installation Example:
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common
wget -O- https://apt.releases.hashicorp.com/gpg | \
gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(grep -oP '(?<=UBUNTU_CODENAME=).*' /etc/os-release || lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update
sudo apt-get install terraformFresh Installation:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/installUpdate Existing Installation
curl "https://awscli.amazonaws.com/awscliv2.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli --updatecurl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --clientcurl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh-
For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Nutanix documentation
-
For AWS EKS follow NAI EKS Deployment
Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.
Edit deployment/pipelines/<your-pipeline>/reference-external-endpoint.yaml and configure the LLM settings:
config:
endpoint: <endpoint path>
LLM_MODEL_SERVER: vllm
LLM_MODEL_SERVER_ENDPOINT: "https://your-vllm-endpoint.com/api"
LLM_MODEL_NAME: <model_name>
LLM_VLLM_API_KEY: "your-api-key-here"Replace the placeholder values with your actual LLM endpoint details.
For higher security in production environments, consider injecting Kubernetes Secrets instead of storing credentials in configuration files. Refer to Update the vLLM API Key Secret for secure credential management.
If your endpoint does not have properly configured TLS you can also add LLM_TLS_SKIP_VERIFY: "True"
Or, to update an existing Intel® AI for Enterprise RAG deployment, see Update Existing Intel® AI for Enterprise RAG Deployment to Use Nutanix Enterprise AI endpoint.
Change pipeline file in inventory's config.yaml to use file, that you have recently changed
pipelines:
- namespace: chatqa
samplePath: chatqa/reference-external-endpoint.yaml
resourcesPath: chatqa/resources-reference-external-endpoint.yaml
modelConfigPath: chatqa/resources-model-cpu.yaml
type: chatqaAdditionally, if eRAG and NAI are on the same cluster, balloons needs to be configured:
balloons:
...
vllm_custom_name: "kserve-container" - For Nutanix Kubernetes Platform (NKP) or on-premises deployments, follow Intel® AI for Enterprise RAG deployment on Kubernetes.
Note
If application will be deployed on Nutanix Kubernetes Platform (NKP), it is recommended to disable telemetry. The instructions are provided in the link.
- For AWS EKS follow EKS Deployment
If you have an existing Intel® AI for Enterprise RAG deployment and want to switch the vLLM endpoint to point to a Nutanix Enterprise AI endpoint, follow the steps below.
Warning
If you are using balloons policy with vllm_custom_name set, instalator will check for resources allocated for this model to ensure that it will fit.
If you deploy model that uses more vCPUs than the one that Intel® AI for Enterprise RAG was deployed with, balloons might not work.
- An existing Intel® AI for Enterprise RAG deployment with the
chatqapipeline running - Access to a Nutanix Enterprise AI endpoint (URL, model name, and API key)
kubectlconfigured to access your Kubernetes cluster
Refer to Obtaining configuration from Nutanix AI LLM Endpoint to obtain the configuration needed for this step.
Patch the chatqa GMConnector to use the new Nutanix Enterprise AI endpoint:
kubectl edit gmconnectors -n chatqa chatqaUpdate the following variables.
Example:
env:
LLM_MODEL_SERVER: vllm
LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api/v1/chat/completions"
LLM_MODEL_NAME: "llama-3-3b"
LLM_TLS_SKIP_VERIFY: "false" Save and exit the editor to apply the changes.
Update the secret containing the vLLM API key with your new Nutanix Enterprise AI API key:
kubectl patch secret vllm-api-key-secret \
-n chatqa \
--type merge \
-p '{"stringData":{"LLM_VLLM_API_KEY":"YOUR_NUTANIX_AI_API_KEY_HERE"}}'Replace YOUR_NUTANIX_AI_API_KEY_HERE with the actual API key from your Nutanix Enterprise AI endpoint.
- Restart the LLM pods to pick up the new configuration:
kubectl rollout restart deployment llm-svc-deployment -n chatqa- Check the pod logs to verify the connection to the new endpoint:
kubectl logs -n chatqa deployment/llm-svc-deployment --tail=50Look for log entries indicating successful connection to the Nutanix Enterprise AI endpoint.
[INFO] - [llms_microservice] - Connection with LLM model server validated successfully.- Test through the Chat UI to confirm For instructions on accessing the UI, see Access the UI/Grafana.
For complete Nutanix AI endpoint configuration instructions, refer to the Nutanix AI documentation.
High Level steps: Acquire the Nutanix AI vLLM endpoint details from your Nutanix AI deployment.
- Navigate to your Nutanix AI management Console
- Click on "Endpoints" to view the list of available LLM endpoints
- Select the desired vLLM endpoint to view its details, including the URL, API key, and Sample Request code (see screenshot).
Note
Make sure that your url ends with /api
Example Nutanix AI vLLM endpoint configuration:
curl -k -X 'POST' 'https://nutanix-ai-endpoint.example.com/api/v1/chat/completions' \
-H "Authorization: Bearer $API_KEY" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "llama-3-3b",
"messages": [
{
"role": "user",
"content": "Explain Deep Neural Networks in simple terms"
}
],
"max_tokens": 256,
"stream": false
}'
Example configuration:
config:
endpoint: /v1/chat/completions
LLM_MODEL_SERVER: vllm
LLM_MODEL_SERVER_ENDPOINT: "https://nutanix-ai-endpoint.example.com/api"
LLM_MODEL_NAME: "llama-3-3b"
LLM_VLLM_API_KEY: "your-api-key-here"
LLM_TLS_SKIP_VERIFY: "True"

