32
32
- [ Atomic Resizes] ( #atomic-resizes )
33
33
- [ Actuating Resizes] ( #actuating-resizes )
34
34
- [ Memory Limit Decreases] ( #memory-limit-decreases )
35
+ - [ Swap] ( #swap )
35
36
- [ Sidecars] ( #sidecars )
36
37
- [ QOS Class] ( #qos-class )
37
38
- [ Resource Quota] ( #resource-quota )
@@ -298,7 +299,7 @@ The `ResizePolicy` field is immutable.
298
299
299
300
#### Resize Status
300
301
301
- Resize status will be tracked via 2 new pod conditions: ` PodResizePending ` and ` PodResizing ` .
302
+ Resize status will be tracked via 2 new pod conditions: ` PodResizePending ` and ` PodResizeInProgress ` .
302
303
303
304
** PodResizePending** will track states where the spec has been resized, but the Kubelet has not yet
304
305
allocated the resources. There are two reasons associated with this condition:
@@ -313,8 +314,8 @@ admitted. `lastTransitionTime` will be populated with the time the condition was
313
314
will always be ` True ` when the condition is present - if there is no longer a pending resized
314
315
(either the resize was allocated or reverted), the condition will be removed.
315
316
316
- ** PodResizing ** will track in-progress resizes, and should be present whenever allocated resources
317
- != acknowledged resources (see [ Resource States] ( #resource-states ) ). For successful synchronous
317
+ ** PodResizeInProgress ** will track in-progress resizes, and should be present whenever allocated resources
318
+ != actuated resources (see [ Resource States] ( #resource-states ) ). For successful synchronous
318
319
resizes, this condition should be short lived, and ` reason ` and ` message ` will be left blank. If an
319
320
error occurs while actuating the resize, the ` reason ` will be set to ` Error ` , and ` message ` will be
320
321
populated with the error message. In the future, this condition will also be used for long-running
@@ -364,11 +365,6 @@ message UpdatePodSandboxResourcesRequest {
364
365
LinuxContainerResources overhead = 2;
365
366
// Optional resources represents the sum of container resources for this sandbox
366
367
LinuxContainerResources resources = 3;
367
-
368
- // Unstructured key-value map holding arbitrary additional information for
369
- // sandbox resources updating. This can be used for specifying experimental
370
- // resources to update or other options to use when updating the sandbox.
371
- map<string, string> annotations = 4;
372
368
}
373
369
374
370
message UpdatePodSandboxResourcesResponse {}
@@ -419,7 +415,7 @@ The Kubelet now tracks 4 sets of resources for each pod/container:
419
415
- Reported in the API through the ` .status.containerStatuses[i].allocatedResources ` field
420
416
(allocated requests only)
421
417
- Persisted locally on the node (requests + limits) in a checkpoint file
422
- 3 . Acknowledged resources
418
+ 3 . Actuated resources
423
419
- The resource configuration that the Kubelet passed to the runtime to actuate
424
420
- Not reported in the API
425
421
- Persisted locally on the node in a checkpoint file
@@ -428,11 +424,12 @@ The Kubelet now tracks 4 sets of resources for each pod/container:
428
424
- The actual resource configuration the containers are running with, reported by the runtime,
429
425
typically read directly from the cgroup configuration
430
426
- Reported in the API via the ` .status.conatinerStatuses[i].resources ` field
427
+ - _ Note: for non-running contiainers ` .status.conatinerStatuses[i].resources ` will be the Allocated resources._
431
428
432
429
Changes are always propogated through these 4 resource states in order:
433
430
434
431
```
435
- Desired --> Allocated --> Acknowledged --> Actual
432
+ Desired --> Allocated --> Actuated --> Actual
436
433
```
437
434
438
435
@@ -512,7 +509,7 @@ This is intentionally hitting various edge-cases for demonstration.
512
509
1 . kubelet runs the pod and updates the API
513
510
- ` spec.containers[0].resources.requests[cpu] ` = 1
514
511
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1
515
- - ` acknowledged [cpu]` = 1
512
+ - ` actuated [cpu]` = 1
516
513
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
517
514
- actual CPU shares = 1024
518
515
@@ -521,100 +518,100 @@ This is intentionally hitting various edge-cases for demonstration.
521
518
` requests ` , ResourceQuota not exceeded, etc) and accepts the operation
522
519
- ` spec.containers[0].resources.requests[cpu] ` = 1.5
523
520
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1
524
- - ` acknowledged [cpu]` = 1
521
+ - ` actuated [cpu]` = 1
525
522
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
526
523
- actual CPU shares = 1024
527
524
528
525
1 . Kubelet Restarts!
529
- - The allocated & acknowledged resources are read back from checkpoint
526
+ - The allocated & actuated resources are read back from checkpoint
530
527
- Pods are resynced from the API server, but admitted based on the allocated resources
531
528
- ` spec.containers[0].resources.requests[cpu] ` = 1.5
532
529
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1
533
- - ` acknowledged [cpu]` = 1
530
+ - ` actuated [cpu]` = 1
534
531
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
535
532
- actual CPU shares = 1024
536
533
537
534
1 . Kubelet syncs the pod, sees resize #1 and admits it
538
535
- ` spec.containers[0].resources.requests[cpu] ` = 1.5
539
536
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.5
540
- - ` acknowledged [cpu]` = 1
537
+ - ` actuated [cpu]` = 1
541
538
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
542
- - ` status.conditions[type==PodResizing ] ` added
539
+ - ` status.conditions[type==PodResizeInProgress ] ` added
543
540
- actual CPU shares = 1024
544
541
545
542
1 . Resize #2 : cpu = 2
546
543
- apiserver validates the request and accepts the operation
547
544
- ` spec.containers[0].resources.requests[cpu] ` = 2
548
545
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.5
549
546
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
550
- - ` status.conditions[type==PodResizing ] `
547
+ - ` status.conditions[type==PodResizeInProgress ] `
551
548
- actual CPU shares = 1024
552
549
553
550
1 . Container runtime applied cpu=1.5
554
551
- ` spec.containers[0].resources.requests[cpu] ` = 2
555
552
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.5
556
- - ` acknowledged [cpu]` = 1.5
553
+ - ` actuated [cpu]` = 1.5
557
554
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1
558
- - ` status.conditions[type==PodResizing ] `
555
+ - ` status.conditions[type==PodResizeInProgress ] `
559
556
- actual CPU shares = 1536
560
557
561
558
1 . kubelet syncs the pod, and sees resize #2 (cpu = 2)
562
559
- kubelet decides this is feasible, but currently insufficient available resources
563
560
- ` spec.containers[0].resources.requests[cpu] ` = 2
564
561
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.5
565
- - ` acknowledged [cpu]` = 1.5
562
+ - ` actuated [cpu]` = 1.5
566
563
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.5
567
564
- ` status.conditions[type==PodResizePending].type ` = ` "Deferred" `
568
- - ` status.conditions[type==PodResizing ] ` removed
565
+ - ` status.conditions[type==PodResizeInProgress ] ` removed
569
566
- actual CPU shares = 1536
570
567
571
568
1 . Resize #3 : cpu = 1.6
572
569
- apiserver validates the request and accepts the operation
573
570
- ` spec.containers[0].resources.requests[cpu] ` = 1.6
574
571
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.5
575
- - ` acknowledged [cpu]` = 1.5
572
+ - ` actuated [cpu]` = 1.5
576
573
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.5
577
574
- ` status.conditions[type==PodResizePending].type ` = ` "Deferred" `
578
575
- actual CPU shares = 1536
579
576
580
577
1 . Kubelet syncs the pod, and sees resize #3 and admits it
581
578
- ` spec.containers[0].resources.requests[cpu] ` = 1.6
582
579
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.6
583
- - ` acknowledged [cpu]` = 1.5
580
+ - ` actuated [cpu]` = 1.5
584
581
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.5
585
582
- ` status.conditions[type==PodResizePending] ` removed
586
- - ` status.conditions[type==PodResizing ] ` added
583
+ - ` status.conditions[type==PodResizeInProgress ] ` added
587
584
- actual CPU shares = 1536
588
585
589
586
1 . Container runtime applied cpu=1.6
590
587
- ` spec.containers[0].resources.requests[cpu] ` = 1.6
591
588
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.6
592
- - ` acknowledged [cpu]` = 1.6
589
+ - ` actuated [cpu]` = 1.6
593
590
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.5
594
- - ` status.conditions[type==PodResizing ] `
591
+ - ` status.conditions[type==PodResizeInProgress ] `
595
592
- actual CPU shares = 1638
596
593
597
594
1 . Kubelet syncs the pod
598
595
- ` spec.containers[0].resources.requests[cpu] ` = 1.6
599
596
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.6
600
- - ` acknowledged [cpu]` = 1.6
597
+ - ` actuated [cpu]` = 1.6
601
598
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.6
602
- - ` status.conditions[type==PodResizing ] ` removed
599
+ - ` status.conditions[type==PodResizeInProgress ] ` removed
603
600
- actual CPU shares = 1638
604
601
605
602
1 . Resize #4 : cpu = 100
606
603
- apiserver validates the request and accepts the operation
607
604
- ` spec.containers[0].resources.requests[cpu] ` = 100
608
605
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.6
609
- - ` acknowledged [cpu]` = 1.6
606
+ - ` actuated [cpu]` = 1.6
610
607
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.6
611
608
- actual CPU shares = 1638
612
609
613
610
1 . Kubelet syncs the pod, and sees resize #4
614
611
- this node does not have 100 CPUs, so kubelet cannot admit it
615
612
- ` spec.containers[0].resources.requests[cpu] ` = 100
616
613
- ` status.containerStatuses[0].allocatedResources[cpu] ` = 1.6
617
- - ` acknowledged [cpu]` = 1.6
614
+ - ` actuated [cpu]` = 1.6
618
615
- ` status.containerStatuses[0].resources.requests[cpu] ` = 1.6
619
616
- ` status.conditions[type==PodResizePending].type ` = ` "Infeasible" `
620
617
- actual CPU shares = 1638
@@ -707,7 +704,7 @@ Impacts of a restart outside of resource configuration are out of scope.
707
704
- Restart before checkpointing: pod goes through admission again as if new
708
705
- Restart after checkpointing: pod goes through admission using the allocated resources
709
706
1 . Kubelet creates a container
710
- - Resources acknowledged after CreateContainer call succeeds
707
+ - Resources actuated after CreateContainer call succeeds
711
708
- Restart before acknowledgement: Kubelet issues a superfluous UpdatePodResources request
712
709
- Restart after acknowledgement: No resize needed
713
710
1 . Container starts, triggering a pod sync event
@@ -721,19 +718,19 @@ Impacts of a restart outside of resource configuration are out of scope.
721
718
1 . Updated pod is synced: Check if pod can be admitted
722
719
- No: add ` PodResizePending ` condition with type ` Deferred ` , no change to allocated resources
723
720
- Restart: redo admission check, still deferred.
724
- - Yes: add ` PodResizing ` condition, update allocated checkpoint
721
+ - Yes: add ` PodResizeInProgress ` condition, update allocated checkpoint
725
722
- Restart before update: readmit, then update allocated
726
- - Restart after update: allocated != acknowledged --> proceed with resize
727
- 1 . Allocated != Acknowledged
728
- - Trigger an ` UpdateContainerResources ` CRI call, then update Acknowledged resources on success
729
- - Restart before CRI call: allocated != acknowledged , will still trigger the update call
730
- - Restart after CRI call, before acknowledged update: will redo update call
731
- - Restart after acknowledged update: allocated == acknowledged , condition removed
732
- - In all restart cases, ` LastTransitionTime ` is propagated from the old pod status ` PodResizing `
723
+ - Restart after update: allocated != actuated --> proceed with resize
724
+ 1 . Allocated != Actuated
725
+ - Trigger an ` UpdateContainerResources ` CRI call, then update Actuated resources on success
726
+ - Restart before CRI call: allocated != actuated , will still trigger the update call
727
+ - Restart after CRI call, before actuated update: will redo update call
728
+ - Restart after actuated update: allocated == actuated , condition removed
729
+ - In all restart cases, ` LastTransitionTime ` is propagated from the old pod status ` PodResizeInProgress `
733
730
condition, and remains unchanged.
734
731
1 . PLEG updates PodStatus cache, triggers pod sync
735
- - Pod status updated with actual resources, ` PodResizing ` condition removed
736
- - Desired == Allocated == Acknowledged , no resize changes needed.
732
+ - Pod status updated with actual resources, ` PodResizeInProgress ` condition removed
733
+ - Desired == Allocated == Actuated , no resize changes needed.
737
734
738
735
#### Notes
739
736
@@ -793,10 +790,10 @@ a pod or container. Examples include:
793
790
Therefore the Kubelet cannot reliably compare desired & actual resources to know whether to trigger
794
791
a resize (a level-triggered approach).
795
792
796
- To accommodate this, the Kubelet stores the set of "acknowledged " resources per container.
797
- Acknowledged resources represent the resource configuration that was passed to the runtime (either
793
+ To accommodate this, the Kubelet stores the set of "actuated " resources per container.
794
+ Actuated resources represent the resource configuration that was passed to the runtime (either
798
795
via a CreateContainer or UpdateContainerResources call) and received a successful response. The
799
- acknowledged resources are checkpointed alongside the allocated resources to persist across
796
+ actuated resources are checkpointed alongside the allocated resources to persist across
800
797
restarts. There is the possibility that a poorly timed restart could lead to a resize request being
801
798
repeated, so ` UpdateContainerResources ` must be idempotent.
802
799
@@ -819,6 +816,15 @@ future, but the design of how limit decreases will be approached is still undeci
819
816
820
817
Memory limit decreases with ` RestartRequired ` are still allowed.
821
818
819
+ ### Swap
820
+
821
+ Currently (v1.33), if swap is enabled & configured, burstable pods are allocated swap based on their
822
+ memory requests. Since resizing swap requires more thought and additional design, we will forbid
823
+ resizing memory requests of such containers for now. Since the API server is not privy to the node's
824
+ swap configuration, this will be surfaced as resizes being marked ` Infeasible ` .
825
+
826
+ We try to relax this restriction in the future.
827
+
822
828
### Sidecars
823
829
824
830
Sidecars, a.k.a. restartable InitContainers can be resized the same as regular containers. There are
@@ -900,6 +906,8 @@ This will be reconsidered post-beta as a future enhancement.
900
906
1 . Handle pod-scoped resources (https://github.com/kubernetes/enhancements/pull/1592 )
901
907
1 . Explore periodic resyncing of resources. That is, periodically issue resize requests to the
902
908
runtime even if the allocated resources haven't changed.
909
+ 1 . Allow resizing containers with swap allocated.
910
+ 1 . Prioritize resizes when resources are freed, or at least make ordering deterministic.
903
911
904
912
#### Mutable QOS Class "Shape"
905
913
@@ -1537,7 +1545,7 @@ _This section must be completed when targeting beta graduation to a release._
1537
1545
- Rename ResizeRestartPolicy ` NotRequired ` to ` PreferNoRestart ` ,
1538
1546
and update CRI ` UpdateContainerResources ` contract
1539
1547
- Add back ` AllocatedResources ` field to resolve a scheduler corner case
1540
- - Introduce Acknowledged resources for actuation
1548
+ - Introduce Actuated resources for actuation
1541
1549
1542
1550
## Drawbacks
1543
1551
0 commit comments