kubernetes · pegasas · May 20, 2025 · tengqm · May 20, 2025
diff --git a/content/en/docs/tasks/configure-pod-container/assign-cpu-resource.md b/content/en/docs/tasks/configure-pod-container/assign-cpu-resource.md
@@ -148,6 +148,50 @@ Delete your Pod:
 kubectl delete pod cpu-demo --namespace=cpu-example
 ```
 
+## How CPU requests and limits work under the hood
+
+### On Linux systems
+
+In Linux, Kubernetes uses cgroups (control groups) to enforce CPU requests and limits. See the [RedHat documentation on cgroups](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/managing-cpu-resources-with-control-groups_managing-monitoring-and-updating-the-kernel) for more details.
+
+#### CPU Requests
+- For cgroups v1: CPU requests are translated to CPU shares. The share value is calculated as `cpu.shares = request * 1024`. The default value is 1024, which corresponds to 1 CPU request. See the [kernel documentation on scheduler-domains](https://docs.kernel.org/scheduler/sched-domains.html).
+- For cgroups v2: CPU requests are translated to CPU weight. Weight values range from 1 to 10000, with a default of 100. See the [kernel documentation on cgroup-v2](https://docs.kernel.org/admin-guide/cgroup-v2.html).
+- The Linux CPU scheduler (CFS - Completely Fair Scheduler) uses these values to determine the proportion of CPU time each container gets when there is CPU contention. For more details, see the [kernel CFS scheduler documentation](https://docs.kernel.org/scheduler/sched-design-CFS.html).
+- When there is no contention, containers may use more CPU than requested, up to their limit.
+
+#### CPU Limits
+- CPU limits are implemented using CPU quota and period.
+- The period is hardcoded to 100ms (100,000 microseconds).
+- The quota is calculated as `cpu.cfs_quota_us = limit * cpu.cfs_period_us`.
+- For example, if you set a limit of 0.5 CPU:
+  - Period = 100,000 microseconds (100ms)
+  - Quota = 50,000 microseconds (50ms)
+- This means in every 100ms period, the container can use the CPU for up to 50ms.
+- If a container tries to use more CPU than its limit, it will be throttled and must wait for the next period.
+- Throttling can cause latency in applications, especially those that are CPU-intensive or require consistent CPU access.
+
+### On Windows systems
+
+Windows handles CPU requests and limits differently:
+
+#### CPU Requests
+- Windows doesn't have a direct equivalent to Linux CPU shares.
+- CPU requests are used primarily for scheduling decisions but don't directly affect runtime CPU allocation.
+
+#### CPU Limits
+- Windows implements CPU limits using CPU caps.
+- The limit is expressed as a percentage of total CPU cycles across all processors.
+- For example, a limit of 0.5 CPU on a 2-core system means the container can use up to 25% of the total CPU cycles.
+- Windows measures CPU usage over a longer time window compared to Linux, which can result in different throttling behavior.
+
+{{< note >}}
+Understanding how these mechanisms work is crucial for application performance tuning. For example, if you observe throttling in your application (which you can monitor through container runtime metrics), you might want to:
+- Adjust your CPU limits to be closer to actual usage patterns
+- Consider using horizontal pod autoscaling instead of relying on CPU bursting
+- Profile your application to optimize CPU usage
+{{< /note >}}
+
 ## Specify a CPU request that is too big for your Nodes
 
 CPU requests and limits are associated with Containers, but it is useful to think