To set up prerequisities and quickly deploy Intel® AI for Enterprise Inference on a single node, follow the steps in the Single Node Deployment Guide. Otherwise, proceed to the section below for all deployment options.
🚀 New: Automated Intel® AI Accelerator firmware and driver management! See Intel® AI Accelerator Prerequisites for automated setup scripts.
Complete all prerequisites.
| Deployment Type | Description |
|---|---|
| Single Node (vLLM, non‑production) | For Quick Testing on Intel® Xeon® processors using vLLM Docker (Guide) |
| Single Node | Quick start for testing or lightweight workloads (Guide) |
| Single Master, Multiple Workers | For higher throughput workloads (Guide) |
| Multi-Master, Multiple Workers | Recommended for HA enterprise clusters (Guide) |
- View the Pre-validated Model List
- To deploy custom models from Hugging Face, follow the Hugging Face Deployment Guide
💡 Both validated and custom models are supported to meet diverse enterprise needs.
Two files are required before deployment:
inventory/hosts.yaml– Cluster inventory and topology for single node and multi-node)inference-config.cfg– Component-level deployment config example
Run the following script to deploy the inference platform:
bash inference-stack-deploy.shIntel® AI for Enterprise Inference supports brownfield deployment, allowing you to deploy the inference stack on an existing Kubernetes cluster without disrupting current workloads. This approach leverages your current infrastructure and preserves existing workloads and configurations.
For brownfield deployment guide, refer Brownfield Deployment Guide.