Skip to content

Commit c965fb4

Browse files
authored
Merge pull request #2697 from pacoxu/ephemeral-storage-quotas-beta
promote ephemeral-storage-quotas to beta in 1.25
2 parents 9efd962 + 45a80c4 commit c965fb4

File tree

3 files changed

+242
-10
lines changed

3 files changed

+242
-10
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
kep-number: 1029
2+
alpha:
3+
approver: "@deads2k"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-node/1029-ephemeral-storage-quotas/README.md

Lines changed: 231 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Table of Contents
44

55
<!-- toc -->
6+
- [Release Signoff Checklist](#release-signoff-checklist)
67
- [Summary](#summary)
78
- [Project Quotas](#project-quotas)
89
- [Motivation](#motivation)
@@ -23,18 +24,32 @@
2324
- [Future](#future)
2425
- [Notes on Implementation](#notes-on-implementation)
2526
- [Notes on Code Changes](#notes-on-code-changes)
27+
- [Test Plan](#test-plan)
2628
- [Testing Strategy](#testing-strategy)
29+
- [Prerequisite testing updates](#prerequisite-testing-updates)
30+
- [Unit tests](#unit-tests)
31+
- [Integration tests](#integration-tests)
32+
- [e2e tests](#e2e-tests)
2733
- [Risks and Mitigations](#risks-and-mitigations)
2834
- [Graduation Criteria](#graduation-criteria)
2935
- [Phase 1: Alpha (1.15)](#phase-1-alpha-115)
30-
- [Phase 2: Beta (target 1.16)](#phase-2-beta-target-116)
36+
- [Phase 2: Beta (target 1.25)](#phase-2-beta-target-125)
3137
- [Phase 3: GA](#phase-3-ga)
3238
- [Performance Benchmarks](#performance-benchmarks)
3339
- [Elapsed Time](#elapsed-time)
3440
- [User CPU Time](#user-cpu-time)
3541
- [System CPU Time](#system-cpu-time)
42+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
43+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
44+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
45+
- [Monitoring Requirements](#monitoring-requirements)
46+
- [Dependencies](#dependencies)
47+
- [Scalability](#scalability)
48+
- [Troubleshooting](#troubleshooting)
3649
- [Implementation History](#implementation-history)
3750
- [Version 1.15](#version-115)
51+
- [Version 1.24](#version-124)
52+
- [Version 1.25](#version-125)
3853
- [Drawbacks [optional]](#drawbacks-optional)
3954
- [Alternatives [optional]](#alternatives-optional)
4055
- [Alternative quota-based implementation](#alternative-quota-based-implementation)
@@ -49,6 +64,21 @@
4964

5065
[Tools for generating]: https://github.com/ekalinin/github-markdown-toc
5166

67+
## Release Signoff Checklist
68+
69+
Items marked with (R) are required *prior to targeting to a milestone / release*.
70+
71+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
72+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
73+
- [X] (R) Design details are appropriately documented
74+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
75+
- [X] (R) Graduation criteria is in place
76+
- [X] (R) Production readiness review completed
77+
- [X] (R) Production readiness review approved
78+
- [ ] "Implementation History" section is up-to-date for milestone
79+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
80+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
81+
5282
## Summary
5383

5484
This proposal applies to the use of quotas for ephemeral-storage
@@ -544,6 +574,9 @@ required elsewhere:
544574
future allow adding additional data without having to change code
545575
other than that which uses the new information.
546576

577+
### Test Plan
578+
579+
547580
#### Testing Strategy
548581

549582
The quota code is by an large not very amendable to unit tests. While
@@ -555,6 +588,40 @@ manager, particularly under stress). It also requires setup in the
555588
form of a prepared filesystem. It would be better served by
556589
appropriate end to end tests.
557590

591+
[x] I/we understand the owners of the involved components may require updates to
592+
existing tests to make this code solid enough prior to committing the changes necessary
593+
to implement this enhancement.
594+
595+
##### Prerequisite testing updates
596+
597+
<!--
598+
Based on reviewers feedback describe what additional tests need to be added prior
599+
implementing this enhancement to ensure the enhancements have also solid foundations.
600+
-->
601+
602+
##### Unit tests
603+
604+
The main unit test is in package under `pkg/volume/util/fsquota/`.
605+
606+
- `pkg/volume/util/fsquota/`: `2022-06-20` - `73%`
607+
- - project.go 75.7%
608+
- - quota.go 100%
609+
- - quota_linux.go 70.6%
610+
611+
See details in https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit&include-filter-by-regex=fsquota.
612+
613+
##### Integration tests
614+
615+
N/A
616+
617+
##### e2e tests
618+
619+
e2e evolution (LocalStorageCapacityIsolationQuotaMonitoring [Slow] [Serial] [Disruptive] [Feature:LocalStorageCapacityIsolationQuota][NodeFeature:LSCIQuotaMonitoring]) can be found in [`test/e2e_node/quota_lsci_test.go`](https://github.com/kubernetes/kubernetes/blob/8cd689e40d253e520b1698d4bcf33992f0ae1d20/test/e2e_node/quota_lsci_test.go#L93-L103)
620+
621+
The e2e tests are slow and serial and we will not promote them to be conformance test then.
622+
There is no failure history or flakes in https://storage.googleapis.com/k8s-triage/index.html?test=LocalStorageCapacityIsolationQuotaMonitoring
623+
624+
558625
### Risks and Mitigations
559626

560627
* The SIG raised the possibility of a container being unable to exit
@@ -610,7 +677,7 @@ The following criteria applies to
610677
- Unit test coverage
611678
- Node e2e test
612679

613-
### Phase 2: Beta (target 1.16)
680+
### Phase 2: Beta (target 1.25)
614681

615682
- User feedback
616683
- Benchmarks to determine latency and overhead of using quotas
@@ -629,7 +696,7 @@ files. The operations performed were as follows, in sequence:
629696

630697
* *Create Files*: Create 4K directories each containing 2K files as
631698
described, in depth-first order.
632-
699+
633700
* *du*: run `du` immediately after creating the files.
634701

635702
* *quota*: where applicable, run `xfs_quota` immediately after `du`.
@@ -640,10 +707,10 @@ files. The operations performed were as follows, in sequence:
640707

641708
* *du (after remount)*: run `mount -o remount <filesystem>`
642709
immediately followed by `du`.
643-
710+
644711
* *quota (after remount)*: run `mount -o remount <filesystem>`
645712
immediately followed by `xfs_quota`.
646-
713+
647714
* *unmount*: `umount` the filesystem.
648715

649716
* *mount*: `mount` the filesystem.
@@ -653,7 +720,7 @@ files. The operations performed were as follows, in sequence:
653720

654721
* *du after umount/mount*: run `du` after unmounting and
655722
mounting the filesystem.
656-
723+
657724
* *Remove Files*: remove the test files.
658725

659726
The test was performed on four separate filesystems:
@@ -709,11 +776,168 @@ and are not reported here.
709776
| du after umount/mount | 66.0 | 82.4 | 29.2 | 28.1 |
710777
| Remove Files | 188.6 | 156.6 | 90.4 | 81.8 |
711778

779+
## Production Readiness Review Questionnaire
780+
781+
### Feature Enablement and Rollback
782+
783+
###### How can this feature be enabled / disabled in a live cluster?
784+
785+
- [x] Feature gate (also fill in values in `kep.yaml`)
786+
- Feature gate name: LocalStorageCapacityIsolationFSQuotaMonitoring
787+
- Components depending on the feature gate: kubelet
788+
789+
This feature uses project quotas to monitor emptyDir volume storage consumption
790+
rather than filesystem walk for better performance and accuracy.
791+
792+
###### Does enabling the feature change any default behavior?
793+
794+
None. Behavior will not change. The change is the way to monitoring the volume
795+
like ephemeral storage volumes and emptyDirs.
796+
When LocalStorageCapacityIsolation is enabled for local ephemeral storage and the
797+
backing filesystem for emptyDir volumes supports project quotas and they are enabled,
798+
use project quotas to monitor emptyDir volume storage consumption rather than
799+
filesystem walk for better performance and accuracy.
800+
801+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
802+
803+
Yes, but only for newly created pods.
804+
- Existed Pods: If the pod was created with enforcing quota, pod will not use the enforcing
805+
quota after the feature gate is disabled.
806+
- Newly Created Pods: After setting the feature gate to false, the newly created pod
807+
will not use the enforcing quota.
808+
809+
###### What happens if we reenable the feature if it was previously rolled back?
810+
811+
Like above, after we reenable the feature, newly created pod will use this feature.
812+
If a pod was created before rolling back, the pod will benefit from this feature as well.
813+
814+
###### Are there any tests for feature enablement/disablement?
815+
816+
Yes, in `test/e2e_node/quota_lsci_test.go`
817+
818+
### Rollout, Upgrade and Rollback Planning
819+
820+
###### How can a rollout or rollback fail? Can it impact already running workloads?
821+
822+
No. The rollout/rollback will not impact running workloads.
823+
824+
###### What specific metrics should inform a rollback?
825+
826+
`kubelet_volume_metric_collection_duration_seconds` was added since v1.24 for duration in
827+
seconds to calculate volume stats. This metric can help to compare between fsquota
828+
monitoring and `du` for disk usage.
829+
830+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
831+
832+
Yes. I tested it locally and fixed [a bug after restarting kubelet](https://github.com/kubernetes/kubernetes/pull/107302)
833+
834+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
835+
836+
LocalStorageCapacityIsolationFSQuotaMonitoring should be turned on only if LocalStorageCapacityIsolationis enabled as well.
837+
If LocalStorageCapacityIsolationFSQuotaMonitoring is turned on but LocalStorageCapacityIsolation is false, the check will be skipped.
838+
839+
### Monitoring Requirements
840+
841+
* **How can an operator determine if the feature is in use by workloads?**
842+
843+
- In kubelet metrics, an operator can check the histgram metric `kubelet_volume_metric_collection_duration_seconds`
844+
with metric_source equals "fsquota". If there is no `metric_source=fsquota`, this feature should be disabled.
845+
- However, to figure out if a workload is use this feature, there is no direct way now and see more in below
846+
methods of how to check fsquota settings on a node.
847+
848+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
849+
850+
- 99.9% of volume stats calculation will cost less than 1s or even 500ms.
851+
It can be calculated by `kubelet_volume_metric_collection_duration_seconds` metrics.
852+
853+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
854+
the health of the service?**
855+
856+
- [x] Metrics
857+
- Metric name: `kubelet_volume_metric_collection_duration_seconds`
858+
- Aggregation method: histogram
859+
- Components exposing the metric: kubelet
860+
861+
* **Are there any missing metrics that would be useful to have to improve observability of this feature? **
862+
863+
- Yes, there are no histogram metrics for each volume. The above metric was grouped by volume types because
864+
the cost for every volume is too expensive. As a result, users cannot figure out if the feature is used by
865+
a workload directly by the metrics. A cluster-admin can check kubelet configuration on each node. If the
866+
feature gate is disabled, workloads on that node will not use it.
867+
For example, run `xfs_quota -x -c 'report -h' /dev/sdc` to check quota settings in the device.
868+
Check `spec.containers[].resources.limits.ephemeral-storage` of each container to compare.
869+
870+
871+
### Dependencies
872+
* **Does this feature depend on any specific services running in the cluster? **
873+
874+
- Yes, the feature depneds on project quotas. Once quotas are enabled, the xfs_quota tool can be used to
875+
set limits and report on disk usage.
876+
877+
878+
### Scalability
879+
* **Will enabling / using this feature result in any new API calls?**
880+
- No.
881+
882+
* **Will enabling / using this feature result in introducing new API types?**
883+
- No.
884+
885+
* **Will enabling / using this feature result in any new calls to the cloud
886+
provider?**
887+
- No.
888+
889+
* **Will enabling / using this feature result in increasing size or count of
890+
the existing API objects?**
891+
- No.
892+
893+
* **Will enabling / using this feature result in increasing time taken by any
894+
operations covered by [existing SLIs/SLOs]?**
895+
- No.
896+
897+
* **Will enabling / using this feature result in non-negligible increase of
898+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
899+
- Yes. It will use less CPU time and IO during ephemeral storage monitoring. `kubelet` now allows use of XFS quotas (on XFS and suitably configured ext4fs filesystems) to monitor storage consumption for ephemeral storage (currently for emptydir volumes only). This method of monitoring consumption is faster and more accurate than the old method of walking the filesystem tree. It does not enforce limits, only monitors consumption.
900+
901+
### Troubleshooting
902+
903+
<!--
904+
This section must be completed when targeting beta to a release.
905+
The Troubleshooting section currently serves the `Playbook` role. We may consider
906+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
907+
details). For now, we leave it here.
908+
-->
909+
910+
###### How does this feature react if the API server and/or etcd is unavailable?
911+
912+
###### What are other known failure modes?
913+
914+
1. If the ephemeral storage limitation is reached, the pod will be evicted by kubelet.
915+
916+
2. It should skip when the image is not configured correctly (unsupported FS or quota not enabled).
917+
918+
3. For "out of space" failure, kublet eviction should be triggered.
919+
920+
921+
###### What steps should be taken if SLOs are not being met to determine the problem?
922+
923+
If the metrics shows some problems, we can check the log and quota dir with below commands.
924+
- There will be warning logs([after the # is merged](https://github.com/kubernetes/kubernetes/pull/107490)) if volume calculation took too long than 1 second
925+
- If quota is enabled, you can find the volume information and the process time with `time repquota -P /var/lib/kubelet -s -v`
926+
712927
## Implementation History
713928

714929
### Version 1.15
715930

716-
` LocalStorageCapacityIsolationFSMonitoring` implemented at Alpha
931+
- `LocalStorageCapacityIsolationFSMonitoring` implemented at Alpha
932+
933+
### Version 1.24
934+
935+
- `kubelet_volume_metric_collection_duration_seconds` metrics was added
936+
- A bug that quota cannot work after kubelet restarted, was fixed
937+
938+
### Version 1.25
939+
940+
- Plan to promote `LocalStorageCapacityIsolationFSMonitoring` to Beta
717941

718942
## Drawbacks [optional]
719943

keps/sig-node/1029-ephemeral-storage-quotas/kep.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ title: Quotas for Ephemeral Storage
22
kep-number: 1029
33
authors:
44
- "@RobertKrawitz"
5+
- "@pacoxu"
56
owning-sig: sig-node
67
participating-sigs:
78
- sig-node
@@ -13,8 +14,10 @@ approvers:
1314
- "@derekwaynecarr"
1415
editor: TBD
1516
creation-date: 2018-09-06
16-
last-updated: 2019-06-04
17+
last-updated: 2022-06-20
1718
status: implementable
18-
19-
latest-milestone: "0.0"
19+
latest-milestone: "1.25"
2020
stage: "alpha"
21+
milestone:
22+
alpha: "1.15"
23+
beta: "1.25"

0 commit comments

Comments
 (0)