This document details the deployment of Intel® AI for Enterprise RAG. By default, the guide assumes a Xeon deployment. If your hardware stack contains Gaudi, modify configuration values accordingly to deployment instructions.
- Intel® AI for Enterprise RAG Deployment
- Virtual Environment Setup
- Validate Hardware Requirements
- Install a Kubernetes cluster (optional - if you don't have one)
- Install infrastructure components (storage, operators, backup tools)
- Deploy the Intel® AI for Enterprise RAG application on top of the prepared infrastructure
- Update application components (models, configurations) as needed
- Create and restore backups of user data and configurations
- Interact with the Deployed Pipeline
- Remove the installation when no longer needed
Intel® AI for Enterprise RAG contains ansible playbooks which provide a complete deployment workflow:
- Validate hardware requirements (recommended before deployment)
- Install a Kubernetes cluster (optional - if you don't have one)
- Install infrastructure components (storage, operators, backup tools)
- Deploy the Intel® AI for Enterprise RAG application on top of the prepared infrastructure
- Update application components (models, configurations) as needed
- Create and restore backups of user data and configurations
- Remove the installation when no longer needed
The validation step helps ensure your hardware meets the minimum requirements for Intel® AI for Enterprise RAG before proceeding with the deployment. The following sections guide you through each of these steps.
Playbooks can be executed after creating a virtual environment and installing all prerequisites that allow running ansible on your local machine. Use the below script to create a virtual environment:
cd deployment
sudo apt-get install python3-venv
python3 -m venv erag-venv
source erag-venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yaml --upgradeBefore proceeding with the deployment, it's recommended to validate that your hardware meets the requirements for Intel® AI for Enterprise RAG. To perform hardware validation, you need to create an inventory.ini file first.
An example inventory.ini file structure and detailed instructions are provided in the Cluster Deployment Guide.
Once you have created the inventory.ini file, you can validate your hardware resources using the validate playbook located at playbooks/validate.yaml:
ansible-playbook playbooks/validate.yaml --tags hardware -i inventory/test-cluster/inventory.iniNote
If this is a Gaudi deployment, add the additional flag -e is_gaudi_platform=true
Intel® AI for Enterprise RAG offers ansible automation for creating a K8s cluster. If you want to set up a K8s cluster, follow the Cluster Deployment Guide.
The Intel® AI for Enterprise RAG repository offers installation of additional infrastructure components on the deployed K8s cluster:
- Gaudi_operator - dedicated for K8s clusters with nodes that use Gaudi AI accelerators
- CSI drivers - need to dynamically provision storage for PODs
- Velero - installing Velero backup tool
- Local registry - creates a pod with registry to store Docker images, useful for multi-node setups where internal Docker registry would not be sufficient as it will be accessible from single node, not from entire K8s cluster. See Local Image Building for configuration details.
If your K8s cluster requires installing any of these tools, follow the Infrastructure Components Guide.
Note
The pre-install tag automatically preconfigures system limits (such as file descriptors, process limits, and kernel parameters) on cluster nodes to ensure optimal performance for Enterprise RAG workloads. These configurations are applied before the main installation process.
Once you have a K8s cluster with all infrastructure components installed, you can install the Intel® AI for Enterprise RAG application on top of it. Follow the Application Deployment Guide.
After the application is installed, you can update its components (for example, change the LLM or embedding model) by editing your configuration file and running the install tag again. The deployment scripts will detect changes and update only the involved components, minimizing downtime and unnecessary redeployments.
To update the application:
- Edit
config.yamland adjust the relevant parameters (e.g.,llm_model,embedding_model_name, or other settings). Feel free to checkout Advanced Configuration Guide with tips on modifying the parameters. - Run:
ansible-playbook playbooks/application.yaml --tags install -e @<path to config.yaml>This will apply the changes and update only the affected services.
The application supports taking backups and restoring user data, including ingested vector data, ingested documents, user accounts and credentials, and chat history.
For detailed instructions on how to configure backup functionality, create backups, and restore from backups, refer to the Backup and Restore Guide.
To verify that the deployment was successful, run the appropriate test command for your pipeline:
For ChatQA Pipeline:
./scripts/test_connection.shIf the deployment is complete, you should observe the following output:
deployment.apps/client-test created
Waiting for all pods to be running and ready....All pods in the chatqa namespace are running and ready.
Connecting to the server through the pod client-test-87d6c7d7b-45vpb using URL http://router-service.chatqa.svc.cluster.local:8080...
data: '\n'
data: 'A'
data: ':'
data: ' AV'
data: 'X'
data: [DONE]
Test finished successfully
For Docsum Pipeline:
./scripts/test_docsum.shThis will test the document summarization functionality by sending a sample document for summarization.
To access the UI, follow these steps:
-
Forward the port from the ingress pod:
sudo -E kubectl port-forward --namespace ingress-nginx svc/ingress-nginx-controller 443:https
-
If you want to access the UI from another machine, tunnel the port from the host:
ssh -L 443:localhost:443 user@ip
-
Update the
/etc/hostsfile on the machine where you want to access the UI to match the domain name with the externally exposed IP address of the cluster. On a Windows machine, this file is typically located atC:\Windows\System32\drivers\etc\hosts.For example, the updated file content should resemble the following:
127.0.0.1 erag.com grafana.erag.com auth.erag.com s3.erag.com seaweedfs.erag.com
Note
This is the IPv4 address of the local machine.
Important
Hostname Requirements:
- Each hostname entry must match the fully qualified domain name (FQDN) configured in your
config.yamlfile. The base domain is set via theFQDNparameter (default:erag.com). - The
/etc/hostsfile requires each subdomain to be listed explicitly (wildcards are not supported). - Alternative for DNS servers: If you're configuring a DNS server instead of using
/etc/hosts, you can use a wildcard record:This wildcard approach only works with DNS servers, not witherag.com A <IP-ADDRESS> *.erag.com A <IP-ADDRESS>/etc/hosts.
Once the update is complete, you can access the Intel® AI for Enterprise RAG UI by typing the following URL in your web browser:
https://erag.com
Keycloak can be accessed via:
https://auth.erag.com
Grafana can be accessed via:
https://grafana.erag.com
SeaweedFS Filer (Web UI) can be accessed via:
https://seaweedfs.erag.com
S3 API is exposed at:
https://s3.erag.com
Caution
If using self-signed certificates (default configuration), access https://s3.erag.com in your browser before ingesting data to accept the certificate warning. This step is not required if you have configured custom SSL certificates.
Once deployment is complete, a file named default_credentials.txt will be created in the deployment/ansible-logs folder with one-time passwords for the application admin and user. After entering the one-time password, you will be required to change the default password.
Caution
Remove the default_credentials.txt file after the first successful login.
Default credentials for Keycloak and Grafana:
- username: admin
- password: stored in
ansible-logs/default_credentials.yamlfile. Change passwords after first login in Grafana or Keycloak.
Caution
Use ansible-vault to secure the password file ansible-logs/default_credentials.yaml after the first successful login by running: ansible-vault encrypt ansible-logs/default_credentials.yaml. After that, remember to add --ask-vault-pass to the ansible-playbook command.
Default credentials for the selected Vector Store are stored in ansible-logs/default_credentials.yaml and are generated on first deployment.
Default credentials for Enhanced Dataprep services:
SeaweedFS:
-
S3 API Access (via
s3.erag.com):- Access Key: value of
EDP_SEAWEEDFS_ACCESS_KEYinansible-logs/default_credentials.yaml - Secret Key: value of
EDP_SEAWEEDFS_SECRET_KEYinansible-logs/default_credentials.yaml
- Access Key: value of
-
Admin Web UI (via
seaweedfs.erag.com):- Username: value of
SEAWEEDFS_ADMIN_USERinansible-logs/default_credentials.yaml - Password: value of
SEAWEEDFS_ADMIN_PASSWORDinansible-logs/default_credentials.yaml
- Username: value of
Internal EDP services credentials:
Redis:
- username: default
- password: stored in
ansible-logs/default_credentials.yaml
Postgres:
- username: edp
- password: stored in
ansible-logs/default_credentials.yaml
For adding data to the knowledge base and exploring the UI interface, visit this page.
For accessing Grafana dashboards for all services, visit this page.
For instructions on how to configure single sign-on, visit this page.
To remove Intel® AI for Enterprise RAG from your cluster, execute:
ansible-playbook playbooks/application.yaml --tags uninstall -e @inventory/test-cluster/config.yaml