Skip to content

[AWS/EKS] Tune VPC CNI warm pool for kubernetes_node_scale in EksKarpenterCluster#6557

Open
kiryl-filatau wants to merge 19 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:aws-5k-fix
Open

[AWS/EKS] Tune VPC CNI warm pool for kubernetes_node_scale in EksKarpenterCluster#6557
kiryl-filatau wants to merge 19 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:aws-5k-fix

Conversation

@kiryl-filatau
Copy link
Collaborator

@kiryl-filatau kiryl-filatau commented Mar 25, 2026

NOTE: should be merged only after PR#6512 is merged.

What

In EksKarpenterCluster._PostCreate, when the benchmark is
kubernetes_node_scale, tune the VPC CNI warm-pool settings on the
aws-node DaemonSet in kube-system and wait for the rollout to
complete before the benchmark run starts.
Settings applied:

  • WARM_ENI_TARGET=0
  • WARM_IP_TARGET=1
  • MINIMUM_IP_TARGET=1

Why

At large node counts (5k) the default CNI warm-pool behaviour
aggressively pre-allocates ENIs and secondary IPs, consuming EC2 IP
capacity before pods are scheduled. Reducing the warm targets lowers
per-node IP pre-allocation, which reduces InsufficientCapacityError
and FailedScheduling pressure during scale-up.

Scope

The tuning block is guarded by 'kubernetes_node_scale' in FLAGS.benchmarks, so it is a no-op for all other benchmarks. No
existing behaviour is changed outside that gate.

Testing

Validated with two back-to-back 5k-node runs on EKS + Karpenter in
us-east-1. Both runs completed with status
SUCCEEDED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant