Skip to content

Document the meaning of CPU requests & limits #50987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,50 @@ Delete your Pod:
kubectl delete pod cpu-demo --namespace=cpu-example
```

## How CPU requests and limits work under the hood

### On Linux systems

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wrap the long "lines" appropriately.
See https://sembr.org for the preferred line wrapping style.

In Linux, Kubernetes uses cgroups (control groups) to enforce CPU requests and limits. See the [RedHat documentation on cgroups](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/managing-cpu-resources-with-control-groups_managing-monitoring-and-updating-the-kernel) for more details.

#### CPU Requests
- For cgroups v1: CPU requests are translated to CPU shares. The share value is calculated as `cpu.shares = request * 1024`. The default value is 1024, which corresponds to 1 CPU request. See the [kernel documentation on scheduler-domains](https://docs.kernel.org/scheduler/sched-domains.html).
- For cgroups v2: CPU requests are translated to CPU weight. Weight values range from 1 to 10000, with a default of 100. See the [kernel documentation on cgroup-v2](https://docs.kernel.org/admin-guide/cgroup-v2.html).
- The Linux CPU scheduler (CFS - Completely Fair Scheduler) uses these values to determine the proportion of CPU time each container gets when there is CPU contention. For more details, see the [kernel CFS scheduler documentation](https://docs.kernel.org/scheduler/sched-design-CFS.html).
- When there is no contention, containers may use more CPU than requested, up to their limit.

#### CPU Limits
- CPU limits are implemented using CPU quota and period.
- The period is hardcoded to 100ms (100,000 microseconds).
- The quota is calculated as `cpu.cfs_quota_us = limit * cpu.cfs_period_us`.
- For example, if you set a limit of 0.5 CPU:
- Period = 100,000 microseconds (100ms)
- Quota = 50,000 microseconds (50ms)
- This means in every 100ms period, the container can use the CPU for up to 50ms.
- If a container tries to use more CPU than its limit, it will be throttled and must wait for the next period.
- Throttling can cause latency in applications, especially those that are CPU-intensive or require consistent CPU access.

### On Windows systems

Windows handles CPU requests and limits differently:

#### CPU Requests
- Windows doesn't have a direct equivalent to Linux CPU shares.
- CPU requests are used primarily for scheduling decisions but don't directly affect runtime CPU allocation.

#### CPU Limits
- Windows implements CPU limits using CPU caps.
- The limit is expressed as a percentage of total CPU cycles across all processors.
- For example, a limit of 0.5 CPU on a 2-core system means the container can use up to 25% of the total CPU cycles.
- Windows measures CPU usage over a longer time window compared to Linux, which can result in different throttling behavior.

{{< note >}}
Understanding how these mechanisms work is crucial for application performance tuning. For example, if you observe throttling in your application (which you can monitor through container runtime metrics), you might want to:
- Adjust your CPU limits to be closer to actual usage patterns
- Consider using horizontal pod autoscaling instead of relying on CPU bursting
- Profile your application to optimize CPU usage
{{< /note >}}

## Specify a CPU request that is too big for your Nodes

CPU requests and limits are associated with Containers, but it is useful to think
Expand Down