Skip to content

KEP-5311: Relaxed validation for Services names #5315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adrianmoisey
Copy link
Member

@adrianmoisey adrianmoisey commented May 18, 2025

  • One-line PR description: Relaxed validation for Services names
  • Other comments:

Warning

Currently a work in progress, but review is very welcome!

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianmoisey
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from aojea May 18, 2025 19:08
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from danwinship May 18, 2025 19:08
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 18, 2025
@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from b4d56a5 to cd35944 Compare May 19, 2025 10:57
- Creation of Services will use the previous RFC-1035 validation for `.metadata.name`
- Creation of Ingress will use the previous RFC-1035 validation for `.spec.[]rules.http.[]paths.backend.service.name`
- Updates of Services will use the new RFC-1123 validation for `.metadata.name`
- Updates of Ingress will use the new RFC-1123 validation for `.spec.[]rules.http.[]paths.backend.service.name`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the key difference here is that .metadata.Name is immutable, if we already allowed the name on creation we can not break that Service when the feature gate is off.

In Ingress is different since the rules can be updated and mutated, I think the behavior we want is, "if previous value was RFC-1123 compatible but not RFC-1035 we allow it BUT we do not allow to add a new or update an RFC-1035 to a RFC-1123 only" ... I think the term used for this in Kubernetes is "ratcheting validation" , @thockin keep me honest here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! That behaviour sounds like what we want! Thanks.

Copy link
Contributor

@danwinship danwinship May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we implemented this for KEP-4858 is that for updates, you only validate the value if it changed, and you just skip validating it if it's still the same value it was before. (You know it had to have been validated when it was originally set, so you don't need to validate it again. Most of the validation code ends up revalidating everything because it's just easier to do that than to check which things changed and only validate the changed parts. But it doesn't need to revalidate the unchanged fields, so we can make it not do that.)

So the rules should be:

  • If the gate is disabled
    • service creation uses 1035 to validate the name
    • (service update doesn't need to worry about this because name is immutable)
    • ingress creation uses 1035 to validate backend service name
    • ingress update does not validate the backend service name if it didn't change in the update, but requires 1035 if it did change
  • If the gate is enabled
    • service creation uses 1123 to validate the name
    • ingress creation/update uses 1123 to validate backend service name

## Summary

This KEP proposes a relaxation of the Service name in order bring it in line
with the validation requirements of other resources in Kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with the validation requirements of other resources in Kubernetes.
with the validation requirements of names of other resources in Kubernetes.


This KEP proposes a relaxation of the Service name in order bring it in line
with the validation requirements of other resources in Kubernetes.
It aims to change the validation from RFC-1035 to RFC-1123.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to get away from those very non-obvious non-descriptive names (eg, see discussion in kubernetes/kube-openapi#384). In particular, those RFCs cover a lot more than just the syntax of DNS labels, and also apparently we don't implement them quite exactly anyway.

You can probably just remove this sentence from the summary. When you talk about it later on, don't talk about it in a way that suggests that "RFC-1035" is an obviously-correct name for the format that Service names use. (You can refer to the IsDNS1035Label function, or talk about "the format that Kubernetes calls 'RFC-1035'" maybe.)

It's probably more important to explain what the distinction between "RFC-1035" and "RFC-1123" (in the Kubernetes context) actually is. ("RFC-1123" names can start with a digit, while "RFC-1035" names can't, which means that the net effect of this KEP is to start allowing service names to start with a digit.)

## Proposal

At time of writing Service names are validated with `apimachineryvalidation.NameIsDNS1035Label`.
The proposal is to change this validation to `apimachineryvalidation.NameIsDNSSubdomain`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NameIsDNSLabel

NameIsDNSSubdomain is the version that allows "fully-qualified domain name"-style names ("name: my.service.has.a.subdomain"), which wouldn't work right for service DNS lookup...


#### Story 1

As a user, I want to name my Deployment "12345", and I'd like a Service and Ingress object to match that name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very weak user story, and I don't think the feature was inspired by user requests anyway, it was inspired by Tim wanting to simplify the architecture. You can just drop the User Stories section.

- Creation of Services will use the previous RFC-1035 validation for `.metadata.name`
- Creation of Ingress will use the previous RFC-1035 validation for `.spec.[]rules.http.[]paths.backend.service.name`
- Updates of Services will use the new RFC-1123 validation for `.metadata.name`
- Updates of Ingress will use the new RFC-1123 validation for `.spec.[]rules.http.[]paths.backend.service.name`
Copy link
Contributor

@danwinship danwinship May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we implemented this for KEP-4858 is that for updates, you only validate the value if it changed, and you just skip validating it if it's still the same value it was before. (You know it had to have been validated when it was originally set, so you don't need to validate it again. Most of the validation code ends up revalidating everything because it's just easier to do that than to check which things changed and only validate the changed parts. But it doesn't need to revalidate the unchanged fields, so we can make it not do that.)

So the rules should be:

  • If the gate is disabled
    • service creation uses 1035 to validate the name
    • (service update doesn't need to worry about this because name is immutable)
    • ingress creation uses 1035 to validate backend service name
    • ingress update does not validate the backend service name if it didn't change in the update, but requires 1035 if it did change
  • If the gate is enabled
    • service creation uses 1123 to validate the name
    • ingress creation/update uses 1123 to validate backend service name

extending the production code to implement this enhancement.
-->

- `<package>`: `<date>` - `<test coverage>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill this in for pkg/apis/core/validation and pkg/apis/networking/validation


- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)

### Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should talk more here about the validation you talked about under "Risks and Mitigations". At what point will that happen?

cluster required to make on upgrade, in order to make use of the enhancement?
-->

### Version Skew Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're only changing a single component, version skew isn't a problem.

CRI or CNI may require updating that component before the kubelet.
-->

## Production Readiness Review Questionnaire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to look over the PRR questionnaires for some other KEPs you're familiar with. The questions are designed to force people to think about edge cases when implementing larger features, but for a very small KEP like this one, you're going to have a lot of 1-sentence or even 1-word answers.

- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format.
- Running integration tests to ensure that dependent components such as Ingress controllers and service discovery mechanisms function correctly with the updated validation.
2. The Ingress resource references Service and will also need a relaxed validation on its reference to the Service
3. Downstream applications may perform validation on fields relating to Service names (ie: [ingress-ngnix](https://github.com/kubernetes/ingress-nginx/blob/d3ab5efd54f38f2b7c961024553b0ad060e2e916/internal/ingress/annotations/parser/validators.go#L190-L197)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this risk be mitigated?
For ingress-nginx I'm happy to make a PR to change its validation, but that assumes it will one day bump to using k8s.io/apimachinery 0.34.0 (or higher).

What about the unknowns? There could be other service meshes, Ingress controllers, etc that are doing their own validation. Is it in scope of this KEP to identify those and get them fixed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the code, it says that the value of the default-backend annotation has to be a DNS1035Label. So that just means that if you create a Service with a DNS1123Label name, you won't be able to refer to that Service from an ingress-nginx default-backend annotation (until they change the ingress-nginx validation). So this isn't really a big deal; you can just not use a 1123 name in that case.

doing their own validation. Is it in scope of this KEP to identify those and get them fixed?

If there are problems, they're probably not going to be because of people doing conflicting validation, they're going to be because of people putting Service names into contexts where an initial digit would screw things up.

Like, for instance, if you write out a YAML file by hand, and put the service name into a field, currently it is guaranteed that that field would get parsed as a string. But with this KEP, it would be possible to have a service whose name was entirely numeric, so if you wrote that out, the consumer of the YAML would find an integer value there rather than a string value and probably break.

Yeah, yeah, you could break it before with a service named "true" too. Yay YAML. Anyway, you get the idea though. People might use service names in contexts where the newly-allowed characters could cause problems.

And there's no way you could find all such places, since this could just be in a script that some random k8s admin somewhere wrote.

So, this is definitely something to worry about, but in the end, we have to just be like "eh... probably not going to happen".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants