Intel AI for Enterprise Inference - CPU Optimization

Overview

The system automatically optimizes CPU allocation for AI models using balloon policy. This happens automatically in the background - no customer configuration required.

Automatic Features

CPU Allocation

System automatically detects available CPU cores
Reserves 18% of CPUs for system processes
Allocates remaining CPUs to AI models
Assigns dedicated CPU cores to each model

Memory Allocation

System automatically detects available memory
Reserves 18% of memory for system processes
Allocates remaining memory to AI models

Hardware Detection

Automatically detects NUMA topology
Configures optimal parallelism strategy
Adjusts resource allocation based on hardware

Configuration

Model Requirements

Models must include this label to receive CPU optimization:

labels:
  name: vllm

Resource Allocation

# Example for 48-core system
resources:
  requests:
    cpu: 40        # Automatically calculated
    memory: 4G

Recommendations for Single Node Clusters with Limited CPUs

For single node clusters (e.g., systems with 48 CPU cores), only Keycloak and APISIX are supported. GenAI Gateway is not supported on these configurations. To deploy GenAI Gateway, a minimum of 96 CPU cores is required.

Summary:

For clusters with limited CPU resources, deploy only Keycloak and APISIX.
GenAI Gateway deployment requires at least 96 CPU cores.

Status Verification

Check System Status

# Verify balloon policy is running
kubectl get pods -n kube-system | grep nri-resource-policy
 
# Check model CPU allocation
kubectl exec <model-pod> -- cat /proc/self/status | grep Cpus_allowed_list

Troubleshooting

If models aren't performing optimally:

Verify balloon policy pod is running
Check model pod has name: vllm label
Confirm CPU allocation in pod status

Summary

CPU optimization runs automatically and provides:

Dedicated CPU cores for each model
Consistent performance
Optimal resource utilization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel AI for Enterprise Inference - CPU Optimization

Overview

Automatic Features

CPU Allocation

Memory Allocation

Hardware Detection

Configuration

Model Requirements

Resource Allocation

Recommendations for Single Node Clusters with Limited CPUs

Status Verification

Check System Status

Troubleshooting

Summary

FilesExpand file tree

cpu-optimization-guide.md

Latest commit

History

cpu-optimization-guide.md

File metadata and controls

Intel AI for Enterprise Inference - CPU Optimization

Overview

Automatic Features

CPU Allocation

Memory Allocation

Hardware Detection

Configuration

Model Requirements

Resource Allocation

Recommendations for Single Node Clusters with Limited CPUs

Status Verification

Check System Status

Troubleshooting

Summary