Skip to content

Release Notes: Intel® AI for Enterprise Inference – Version 1.5.1 Hotfix

Latest

Choose a tag to compare

@AhmedSeemalK AhmedSeemalK released this 26 Mar 06:03
· 1 commit to main since this release
5df2a50

Overview

Intel® AI for Enterprise Inference streamlines the deployment and management of AI inference services on Intel hardware. Focused on Kubernetes orchestration, it automates deploying LLM models, provisioning compute, and configuring hardware for fast, scalable, and secure inference—both on-premises and in cloud-native settings. It provides compatibility with OpenAI standard APIs, making it easy to integrate with enterprise applications.


System Requirements

Category Details
Operating System Ubuntu 22.04, Ubuntu 24.04
Hardware Platforms 3rd, 4th, 5th, 6th Gen Intel® Xeon® Scalable processors; Intel® Gaudi® 2 & 3 AI Accelerators
Gaudi Firmware 1.22.0
  • Network: Internet access required for deployment; open ports for Kubernetes and container registry.
  • Storage: Allocate storage based on model size and observability tools (recommend at least 30GB for monitoring data).
  • Other: SSH key pair, SSL/TLS certificates, Hugging Face token.

Hotfix Changes:

  • Resolved deployment connectivity issues in the proxy environment

Deployment Modes

  • Single Node: Quick start for testing or lightweight workloads.
  • Single Master, Multiple Workers: For higher throughput workloads.
  • Multi-Master, Multiple Workers: Enterprise-ready HA cluster.

Key Features

Agentic AI Workflow

  • Integrated Flowise deployment with PostgreSQL and Redis backend for production-ready agent workflows
  • Added configuration toggle (deploy_agenticai_plugin) to enable/disable plugin deployment
  • Provided comprehensive quick-start documentation with setup and usage examples
  • Included sample multi-agent template for software development team workflows

Ubuntu 24 Base OS Support

  • Extended stack deployment support to run on Ubuntu 24.04 base OS

MCP Tools & Server

  • MCP server deployment support on Enterprise Inference Stack via Helm chart template (mcp-server-template)
  • OIDC authentication integration for MCP endpoints using Keycloak + APISIX (secure access to remote MCP endpoints)
  • Ingress support for streaming HTTP for MCP servers
  • New documentation/guide for MCP server deployment (containerization + Helm configuration + deployment steps)

Brownfield Deployment

  • System pre-checks for brownfield deployment in EKS
  • Validation for existing clusters/workloads to ensure safe adoption in EKS
  • ALB cost optimization by enabling ALB grouping across all Ingress resources
  • Reduced duplicated ALB provisioning by consolidating Ingress where feasible

Balloon Policy (NUMA-Aware CPU Allocation Fixes)

  • Fixed CPU imbalanced allocation issue among NUMA node
  • vLLM pods use exclusive allocation of CPU cores
  • Other services (e.g., Keycloak) use reserved allocation of CPU cores
  • vLLM requested CPUs must use all remaining CPUs excluding reserved CPUs, ensuring balanced CPU allocation across NUMA nodes (per expected topology/diagram)

Node Label Strategy Update

  • Updated node label strategy to use boolean labels for clearer scheduling/selection semantics

Docker-Based Single Node Deployment

  • Enabled docker based single node vLLM deployment for local inferencing support.

Getting Started

Please refer below documentation for getting started guide
See the Quick Start Guide and Cluster Setup for details.


Post-Deployment

  • Access deployed models via API endpoints (OpenAI compatible).
  • Use built-in observability dashboards for monitoring and troubleshooting.

Supported Models


License

  • Licensed under the Apache License 2.0.

Thank you for using Intel® AI for Enterprise Inference!