This guide provides step-by-step instructions to deploy Intel® AI for Enterprise Inference on a single node.
Before running the automation, complete all prerequisites.
Clone the Enterprise Inference repo, then copy the single node preset inference config file to the working directory:
Since we are testing locally, we need to map a fake domain (api.example.com) to localhost in the /etc/hosts file.
Run the following command to edit the hosts file:
sudo nano /etc/hosts
Add this line at the end:
127.0.0.1 api.example.com
Save and exit (CTRL+X, then Y and Enter).
Run the following command to create a self-signed SSL certificate that covers api.example.com and trace-api.example.com:
mkdir -p ~/certs && cd ~/certs && \
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes \
-subj "/CN=api.example.com" \
-addext "subjectAltName = DNS:api.example.com, DNS:trace-api.example.com"Note: the -addext option requires OpenSSL >= 1.1.1.
Files produced:
- cert.pem — the self-signed certificate (contains SANs)
- key.pem — the private key
Important DNS step: Please add trace-api.example.com to DNS and point it to the node where Ingress controller is deployed.
Move the single node preset inference config file to the runnig directory
cd ~
git clone https://github.com/opea-project/Enterprise-Inference.git
cd Enterprise-Inference
cp -f docs/examples/single-node/inference-config.cfg core/inventory/inference-config.cfg
Modify inference-config.cfg if needed. Ensure the cluster_url field is set to the DNS used, and the paths to the certificate and key files are valid. The keycloak fields and deployment options can be left unchanged. For systems behind a proxy, refer to the proxy guide.
Copy the single node preset hosts config file to the working directory:
cp -f docs/examples/single-node/hosts.yaml core/inventory/hosts.yamlNote The
ansible_userfield is set to ubuntu by default. Change it to the actual username used.
Now run the automation using the configured files.
cd core
chmod +x inference-stack-deploy.shExport the Hugging Face token as an environment variable by replacing "Your_Hugging_Face_Token_ID" with actual Hugging Face Token. Alternatively, set hugging-face-token to the token value inside inference-config.cfg.
export HUGGINGFACE_TOKEN=<<Your_Hugging_Face_Token_ID>>Follow the steps below depending on the hardware platform. The models argument can be excluded and there will be a prompt to select from a list of models.
Run the command below to deploy the Llama 3.1 8B parameter model on CPU.
./inference-stack-deploy.sh --models "21" --cpu-or-gpu "cpu" --hugging-face-token $HUGGINGFACE_TOKEN📝 Note: If you're using Intel® Gaudi® AI Accelerators, ensure firmware and drivers are up to date using the automated setup scripts before deployment.
Run the command below to deploy the Llama 3.1 8B parameter model on Intel® Gaudi®. For Gaudi 3, set cpu-or-gpu to gaudi3 instead.
./inference-stack-deploy.sh --models "1" --cpu-or-gpu "gpu" --hugging-face-token $HUGGINGFACE_TOKENSelect Option 1 and confirm the Yes/No prompt.
This will deploy the setup automatically. If any issues are encountered, double-check the prerequisites and configuration files.
On the node run the following commands to test if Intel® AI for Enterprise Inference is successfully deployed:
If using Keycloak, generate a token using the script generate-token.sh. Ensure the values of the variables match what is set in inference-config.cfg. This will also set the environment variable TOKEN used in the next step.
source scripts/generate-token.shTo test on CPU only. Note vllmcpu is appended to the URL.
curl -k ${BASE_URL}/Llama-3.1-8B-Instruct-vllmcpu/v1/completions -X POST -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN"
To test on Intel® Gaudi® AI Accelerators:
```bash
curl -k ${BASE_URL}/Llama-3.1-8B-Instruct/v1/completions -X POST -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN"
## Post-Deployment
With the deployed model on the server, refer to the [post-deployment instructions](./README.md#post-deployment) for options.