From 8f6bd3f620d2b38771c32cdd5314990e7acf63c6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Natalie=20Klestrup=20R=C3=B6ijezon?= Date: Thu, 21 Nov 2024 14:17:12 +0100 Subject: [PATCH 1/7] Initial draft for KEP-4969 --- .../README.md | 934 ++++++++++++++++++ .../4969-cluster-domain-downward-api/kep.yaml | 45 + 2 files changed, 979 insertions(+) create mode 100644 keps/sig-network/4969-cluster-domain-downward-api/README.md create mode 100644 keps/sig-network/4969-cluster-domain-downward-api/kep.yaml diff --git a/keps/sig-network/4969-cluster-domain-downward-api/README.md b/keps/sig-network/4969-cluster-domain-downward-api/README.md new file mode 100644 index 00000000000..7a6394b777e --- /dev/null +++ b/keps/sig-network/4969-cluster-domain-downward-api/README.md @@ -0,0 +1,934 @@ + +# KEP-4969: Cluster Domain Downward API + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) + - [Split Domains](#split-domains) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Currently, all Services (and many Pods) have Fully Qualified Domain Names (FQDNs) +that are constructed using the format `{service}.{namespace}.svc.{clusterDomain}`, +where `{clusterDomain}` is _typically_ `cluster.local`, but can be reconfigured +by the cluster administrator. + +Currently there is no way for cluster workloads to query for this domain name, +leaving them either use relative domain names or take it as manual configuration. + +This KEP proposes adding a new Downward API for that workloads can use to request it. + + + +## Motivation + +Relative domain names are a source of ambiguity: does `get.app` refer to +[the domain registry](https://get.app./) or the Service `get` in `app`? +This also becomes problematic for TLS, since there is no way to distinguish +which of these two cases a certificate applies to. + +Fully Qualified Domain Names (FQDNs) can be used to resolve this, by always +specifying the full domain name. However, requiring each workload to configure +the cluster domain is tedious and error-prone, discouraging application +developers from using them. + +Many distributions already provide ways to query for the cluster domain (such as +kubeadm[^prior-art-kubeadm], k3s[^prior-art-k3s], and OpenShift[^prior-art-openshift]). +However, these are all inconsistent with each other, requiring applications to +provide special cases for each. + +[^prior-art-kubeadm]: kubeadm (including kind) creates a ConfigMap `kube-system/kubeadm-config` that contains the full kubeadm config, including `networking.dnsDomain`. +[^prior-art-k3s]: k3s creates a ConfigMap `kube-system/clusterdns` that contains it as the `.data.clusterDomain` field. See . +[^prior-art-openshift]: OpenShift defines a custom DNS CRD that contains it as the `.status.clusterDomain` field. See . + +It can also be retrieved from the kubelet's `/configz` endpoint, however this is +[considered unstable](https://github.com/kubernetes/kubernetes/blob/9d967ff97332a024b8ae5ba89c83c239474f42fd/staging/src/k8s.io/component-base/configz/OWNERS#L3-L5). + + + +### Goals + +- Making it easier to use and generate FQDNs. +- Reducing the difference between Kubernetes distributions. + + + +### Non-Goals + +- Disallowing relative domain names. +- Modifying DNS resolution. +- Centralizing management of the cluster domain name setting. +- Exposing all kubelet configuration. + + + +## Proposal + +Add a Downward API for the cluster domain that containers can request. + + + +### User Stories (Optional) + + + +#### Story 1 + +The Pod `foo` needs to access its sibling Service `bar` in the same namespace. +It adds two `env` bindings: + +``` yaml +apiVersion: v1 +kind: Pod +metadata: + name: foo +spec: + containers: + - name: foo + env: + - name: NAMESPACE + valueFrom: + fieldRef: metadata.namespace + - name: CLUSTER_DOMAIN + valueFrom: + clusterPropertyRef: clusterDomain +``` + +`foo` can now perform the query by running `curl http://bar.$NAMESPACE.svc.$CLUSTER_DOMAIN/`. + +(Of course, in practice this would likely be integrated into the app itself, not by shelling into bash, but the principle still applies.) + +#### Story 2 + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +#### Split Domains + +Kubernetes currently does not prohibit different kubelets from specifying +different cluster domains (`node-a` could set `cluster.local` while `node-b` +specifies `cluster.remote`). +Exposing FQDNs generated using this API could cause issues in these mixed +environments, since `node-b` might not be able to resolve `cluster.local` +FQDNs correctly. + +For this KEP to make sense, this would have to be explicitly prohibited. + +## Design Details + +A new Downward API `clusterPropertyRef: clusterDomain` would be introduced, which can be projected into an environment variable or a volume file. + +<<[UNRESOLVED @nightkr @aojea @thockin]>> +The name is undecided. Other candidates: + +- `nodePropertyRef` (@aojea) +- `runtimeConfigs` (@thockin) + +This also implies a decision about who "owns" the setting, the cluster as a whole or the individual kubelet. +<<[/UNRESOLVED]>> + + + +### Test Plan + + + +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + + + +- : + +##### e2e tests + + + +- : + +### Graduation Criteria + + + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + +This KEP requires that the cluster domain assumes that all kubelets in the +cluster share the same cluster domain. + + + +## Alternatives + + + +### ConfigMap á la k3s {#alternative-configmap} + +The ConfigMap written by k3s[^prior-art-k3s] could be blessed, requiring that +all other distributions also provide it. However, this would require additional +migration effort from each distribution. + +Additionally, this would be problematic to query for: users would have to query +it manually using the Kubernetes API (since ConfigMaps cannot be mounted across +Namespaces), and users would require RBAC permission to query wherever it is stored. + +### Dedicated API Resource + +This roughly shares the arguments for/against as [the ConfigMap alternative](#alternative-configmap), +although it would allow more precise RBAC policy targeting. + +### kubelet `/configz` + +The kubelet exposes a `/configz` endpoint which can be used to query its internal configuration. +This currently contains the cluster domain name. + +However, this is a diagnostic utility, not a stable, documented, and versioned +API. Encouraging users to rely on it also ossifies the idea that it is a part +of the kubelet's static configuration. + +### Parsing `resolv.conf` + +The kubelet adds the cluster domain to the containers' `/etc/resolv.conf` file. +This could be parsed by users in order to guess the domain name. + +However, this is an implementation detail that could be replaced by other +mechanisms in the future, or disabled entirely. + +It also requires clients to guess which domain is the correct one, which could +have false positives. + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-network/4969-cluster-domain-downward-api/kep.yaml b/keps/sig-network/4969-cluster-domain-downward-api/kep.yaml new file mode 100644 index 00000000000..6de6d3702b7 --- /dev/null +++ b/keps/sig-network/4969-cluster-domain-downward-api/kep.yaml @@ -0,0 +1,45 @@ +title: Cluster Domain Downward API +kep-number: 4969 +authors: + - "@nightkr" +owning-sig: sig-network +participating-sigs: + - sig-network + - sig-node +status: provisional +creation-date: 2024-11-19 +reviewers: + - "@thockin" + - "@aojea" +approvers: + - TBD + +see-also: [] +replaces: [] + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.32" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: TBD + beta: TBD + stable: TBD + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: MyFeature + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric From a878d62e65a7a240735eb85567f5586e5b22b32c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Natalie=20Klestrup=20R=C3=B6ijezon?= Date: Thu, 21 Nov 2024 15:38:51 +0100 Subject: [PATCH 2/7] Grammar fixes (thanks lfrancke) --- .../4969-cluster-domain-downward-api/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/keps/sig-network/4969-cluster-domain-downward-api/README.md b/keps/sig-network/4969-cluster-domain-downward-api/README.md index 7a6394b777e..b9cc1664e23 100644 --- a/keps/sig-network/4969-cluster-domain-downward-api/README.md +++ b/keps/sig-network/4969-cluster-domain-downward-api/README.md @@ -155,15 +155,15 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -Currently, all Services (and many Pods) have Fully Qualified Domain Names (FQDNs) +All Kubernetes Services (and many Pods) have Fully Qualified Domain Names (FQDNs) that are constructed using the format `{service}.{namespace}.svc.{clusterDomain}`, where `{clusterDomain}` is _typically_ `cluster.local`, but can be reconfigured by the cluster administrator. -Currently there is no way for cluster workloads to query for this domain name, -leaving them either use relative domain names or take it as manual configuration. +Currently, there is no way for cluster workloads to query for this domain name, +leaving them to either use relative domain names or configure it manually. -This KEP proposes adding a new Downward API for that workloads can use to request it. +This KEP proposes adding a new Downward API that workloads can use to request it. From cb664d26099822e5027da639381f99505ed7120a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Natalie=20Klestrup=20R=C3=B6ijezon?= Date: Fri, 22 Nov 2024 15:54:05 +0100 Subject: [PATCH 4/7] Update keps/sig-network/4969-cluster-domain-downward-api/README.md Co-authored-by: Tim Bannister --- keps/sig-network/4969-cluster-domain-downward-api/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-network/4969-cluster-domain-downward-api/README.md b/keps/sig-network/4969-cluster-domain-downward-api/README.md index 856f1d6af9f..5d83cd6cb94 100644 --- a/keps/sig-network/4969-cluster-domain-downward-api/README.md +++ b/keps/sig-network/4969-cluster-domain-downward-api/README.md @@ -245,7 +245,7 @@ and make progress. ## Proposal -Add a Downward API for the cluster domain that containers can request. +Add a new field mapping into the Downward API mechanism, that lets containers retrieve the cluster domain name.