Skip to content

Latest commit

 

History

History

README.md

Intel® AI for Enterprise RAG deployment guide

This document details the deployment of Intel® AI for Enterprise RAG. By default, the guide assumes a Xeon deployment. If your hardware stack contains Gaudi, modify configuration values accordingly to deployment instructions.

Table of Contents

  1. Intel® AI for Enterprise RAG Deployment
    1. Virtual Environment Setup
    2. Validate Hardware Requirements
    3. Install a Kubernetes cluster (optional - if you don't have one)
    4. Install infrastructure components (storage, operators, backup tools)
    5. Deploy the Intel® AI for Enterprise RAG application on top of the prepared infrastructure
    6. Update application components (models, configurations) as needed
    7. Create and restore backups of user data and configurations
  2. Interact with the Deployed Pipeline
    1. Test Deployment
    2. Access the UI/Grafana
    3. UI Credentials for the First Login
    4. Credentials for Grafana and Keycloak
    5. Credentials for Vector Store
    6. Credentials for Enhanced Dataprep Pipeline (EDP)
    7. Data Ingestion, UI and Telemetry
    8. Configure Single Sign-On Integration Using Microsoft Entra ID
  3. Remove the installation when no longer needed

Intel® AI for Enterprise RAG Deployment

Intel® AI for Enterprise RAG contains ansible playbooks which provide a complete deployment workflow:

  1. Validate hardware requirements (recommended before deployment)
  2. Install a Kubernetes cluster (optional - if you don't have one)
  3. Install infrastructure components (storage, operators, backup tools)
  4. Deploy the Intel® AI for Enterprise RAG application on top of the prepared infrastructure
  5. Update application components (models, configurations) as needed
  6. Create and restore backups of user data and configurations
  7. Remove the installation when no longer needed

The validation step helps ensure your hardware meets the minimum requirements for Intel® AI for Enterprise RAG before proceeding with the deployment. The following sections guide you through each of these steps.

Virtual Environment Setup

Playbooks can be executed after creating a virtual environment and installing all prerequisites that allow running ansible on your local machine. Use the below script to create a virtual environment:

cd deployment
sudo apt-get install python3-venv
python3 -m venv erag-venv
source erag-venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yaml --upgrade

Validate Hardware Requirements

Before proceeding with the deployment, it's recommended to validate that your hardware meets the requirements for Intel® AI for Enterprise RAG. To perform hardware validation, you need to create an inventory.ini file first.

An example inventory.ini file structure and detailed instructions are provided in the Cluster Deployment Guide.

Once you have created the inventory.ini file, you can validate your hardware resources using the validate playbook located at playbooks/validate.yaml:

ansible-playbook playbooks/validate.yaml --tags hardware -i inventory/test-cluster/inventory.ini

Note

If this is a Gaudi deployment, add the additional flag -e is_gaudi_platform=true

Install a Kubernetes cluster (optional - if you don't have one)

Intel® AI for Enterprise RAG offers ansible automation for creating a K8s cluster. If you want to set up a K8s cluster, follow the Cluster Deployment Guide.

Install infrastructure components (storage, operators, backup tools)

The Intel® AI for Enterprise RAG repository offers installation of additional infrastructure components on the deployed K8s cluster:

  • Gaudi_operator - dedicated for K8s clusters with nodes that use Gaudi AI accelerators
  • CSI drivers - need to dynamically provision storage for PODs
  • Velero - installing Velero backup tool
  • Local registry - creates a pod with registry to store Docker images, useful for multi-node setups where internal Docker registry would not be sufficient as it will be accessible from single node, not from entire K8s cluster. See Local Image Building for configuration details.

If your K8s cluster requires installing any of these tools, follow the Infrastructure Components Guide.

Note

The pre-install tag automatically preconfigures system limits (such as file descriptors, process limits, and kernel parameters) on cluster nodes to ensure optimal performance for Enterprise RAG workloads. These configurations are applied before the main installation process.

Deploy the Intel® AI for Enterprise RAG application on top of the prepared infrastructure

Once you have a K8s cluster with all infrastructure components installed, you can install the Intel® AI for Enterprise RAG application on top of it. Follow the Application Deployment Guide.

Update application components (models, configurations) as needed

After the application is installed, you can update its components (for example, change the LLM or embedding model) by editing your configuration file and running the install tag again. The deployment scripts will detect changes and update only the involved components, minimizing downtime and unnecessary redeployments.

To update the application:

  1. Edit config.yaml and adjust the relevant parameters (e.g., llm_model, embedding_model_name, or other settings). Feel free to checkout Advanced Configuration Guide with tips on modifying the parameters.
  2. Run:
ansible-playbook playbooks/application.yaml --tags install -e @<path to config.yaml>

This will apply the changes and update only the affected services.

Create and restore backups of user data and configurations

The application supports taking backups and restoring user data, including ingested vector data, ingested documents, user accounts and credentials, and chat history.

For detailed instructions on how to configure backup functionality, create backups, and restore from backups, refer to the Backup and Restore Guide.

Interact with the Deployed Pipeline

Test Deployment

To verify that the deployment was successful, run the appropriate test command for your pipeline:

For ChatQA Pipeline:

./scripts/test_connection.sh

If the deployment is complete, you should observe the following output:

deployment.apps/client-test created
Waiting for all pods to be running and ready....All pods in the chatqa namespace are running and ready.
Connecting to the server through the pod client-test-87d6c7d7b-45vpb using URL http://router-service.chatqa.svc.cluster.local:8080...
data: '\n'
data: 'A'
data: ':'
data: ' AV'
data: 'X'
data: [DONE]
Test finished successfully

For Docsum Pipeline:

./scripts/test_docsum.sh

This will test the document summarization functionality by sending a sample document for summarization.

Access the UI/Grafana

To access the UI, follow these steps:

  1. Forward the port from the ingress pod:

    sudo -E kubectl port-forward --namespace ingress-nginx svc/ingress-nginx-controller 443:https
  2. If you want to access the UI from another machine, tunnel the port from the host:

    ssh -L 443:localhost:443 user@ip
  3. Update the /etc/hosts file on the machine where you want to access the UI to match the domain name with the externally exposed IP address of the cluster. On a Windows machine, this file is typically located at C:\Windows\System32\drivers\etc\hosts.

    For example, the updated file content should resemble the following:

    127.0.0.1 erag.com grafana.erag.com auth.erag.com s3.erag.com seaweedfs.erag.com
    

Note

This is the IPv4 address of the local machine.

Important

Hostname Requirements:

  • Each hostname entry must match the fully qualified domain name (FQDN) configured in your config.yaml file. The base domain is set via the FQDN parameter (default: erag.com).
  • The /etc/hosts file requires each subdomain to be listed explicitly (wildcards are not supported).
  • Alternative for DNS servers: If you're configuring a DNS server instead of using /etc/hosts, you can use a wildcard record:
    erag.com      A    <IP-ADDRESS>
    *.erag.com    A    <IP-ADDRESS>
    
    This wildcard approach only works with DNS servers, not with /etc/hosts.

Once the update is complete, you can access the Intel® AI for Enterprise RAG UI by typing the following URL in your web browser: https://erag.com

Keycloak can be accessed via: https://auth.erag.com

Grafana can be accessed via: https://grafana.erag.com

SeaweedFS Filer (Web UI) can be accessed via: https://seaweedfs.erag.com

S3 API is exposed at: https://s3.erag.com

Caution

If using self-signed certificates (default configuration), access https://s3.erag.com in your browser before ingesting data to accept the certificate warning. This step is not required if you have configured custom SSL certificates.

UI Credentials for the First Login

Once deployment is complete, a file named default_credentials.txt will be created in the deployment/ansible-logs folder with one-time passwords for the application admin and user. After entering the one-time password, you will be required to change the default password.

Caution

Remove the default_credentials.txt file after the first successful login.

Credentials for Grafana and Keycloak

Default credentials for Keycloak and Grafana:

  • username: admin
  • password: stored in ansible-logs/default_credentials.yaml file. Change passwords after first login in Grafana or Keycloak.

Caution

Use ansible-vault to secure the password file ansible-logs/default_credentials.yaml after the first successful login by running: ansible-vault encrypt ansible-logs/default_credentials.yaml. After that, remember to add --ask-vault-pass to the ansible-playbook command.

Credentials for Vector Store

Default credentials for the selected Vector Store are stored in ansible-logs/default_credentials.yaml and are generated on first deployment.

Credentials for Enhanced Dataprep Pipeline (EDP)

Default credentials for Enhanced Dataprep services:

SeaweedFS:

  • S3 API Access (via s3.erag.com):

    • Access Key: value of EDP_SEAWEEDFS_ACCESS_KEY in ansible-logs/default_credentials.yaml
    • Secret Key: value of EDP_SEAWEEDFS_SECRET_KEY in ansible-logs/default_credentials.yaml
  • Admin Web UI (via seaweedfs.erag.com):

    • Username: value of SEAWEEDFS_ADMIN_USER in ansible-logs/default_credentials.yaml
    • Password: value of SEAWEEDFS_ADMIN_PASSWORD in ansible-logs/default_credentials.yaml

Internal EDP services credentials:

Redis:

  • username: default
  • password: stored in ansible-logs/default_credentials.yaml

Postgres:

  • username: edp
  • password: stored in ansible-logs/default_credentials.yaml

Data Ingestion, UI and Telemetry

For adding data to the knowledge base and exploring the UI interface, visit this page.

For accessing Grafana dashboards for all services, visit this page.

Configure Single Sign-On Integration Using Microsoft Entra ID

For instructions on how to configure single sign-on, visit this page.

Remove the installation when no longer needed

To remove Intel® AI for Enterprise RAG from your cluster, execute:

ansible-playbook playbooks/application.yaml --tags uninstall -e @inventory/test-cluster/config.yaml