The system automatically optimizes CPU allocation for AI models using balloon policy. This happens automatically in the background - no customer configuration required.
- System automatically detects available CPU cores
- Reserves 18% of CPUs for system processes
- Allocates remaining CPUs to AI models
- Assigns dedicated CPU cores to each model
- System automatically detects available memory
- Reserves 18% of memory for system processes
- Allocates remaining memory to AI models
- Automatically detects NUMA topology
- Configures optimal parallelism strategy
- Adjusts resource allocation based on hardware
Models must include this label to receive CPU optimization:
labels:
name: vllm# Example for 48-core system
resources:
requests:
cpu: 40 # Automatically calculated
memory: 4GFor single node clusters (e.g., systems with 48 CPU cores), only Keycloak and APISIX are supported. GenAI Gateway is not supported on these configurations. To deploy GenAI Gateway, a minimum of 96 CPU cores is required.
Summary:
- For clusters with limited CPU resources, deploy only Keycloak and APISIX.
- GenAI Gateway deployment requires at least 96 CPU cores.
# Verify balloon policy is running
kubectl get pods -n kube-system | grep nri-resource-policy
# Check model CPU allocation
kubectl exec <model-pod> -- cat /proc/self/status | grep Cpus_allowed_listIf models aren't performing optimally:
- Verify balloon policy pod is running
- Check model pod has
name: vllmlabel - Confirm CPU allocation in pod status
CPU optimization runs automatically and provides:
- Dedicated CPU cores for each model
- Consistent performance
- Optimal resource utilization