You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Items marked with (R) are required *prior to targeting to a milestone / release*.
52
51
53
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
54
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
52
+
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
53
+
-[x] (R) KEP approvers have approved the KEP status as `implementable`
55
54
-[x] (R) Design details are appropriately documented
56
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
57
-
-[ ] e2e Tests for all Beta API Operations (endpoints)
58
-
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
59
-
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
60
-
-[] (R) Graduation criteria is in place
61
-
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
62
-
-[] (R) Production readiness review completed
63
-
-[] (R) Production readiness review approved
64
-
-[] "Implementation History" section is up-to-date for milestone
65
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
66
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
55
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
56
+
-[ ]~~e2e Tests for all Beta API Operations (endpoints)~~ no API endpoints
57
+
-[ ]~~(R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)~~ no API endpoints
58
+
-[ ]~~(R) Minimum Two Week Window for GA e2e tests to prove flake free~~ no API endpoints
59
+
-[x] (R) Graduation criteria is in place
60
+
-[ ]~~(R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)~~ no API endpoints
61
+
-[x] (R) Production readiness review completed
62
+
-[x] (R) Production readiness review approved
63
+
-[x] "Implementation History" section is up-to-date for milestone
64
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
65
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -142,9 +141,9 @@ A new versioned grpc API (ExternalJWTSigner) will be created under `k8s.io/kuber
142
141
#### Support for Legacy Tokens
143
142
144
143
Implementers will have following options for legacy token support:
145
-
1.Let the Controller loop run as it is with static signing keys. Stitch the public keys in external signer's JWKs.
146
-
2.Turn off the loop (don't support legacy tokens) if external signing is enabled.
147
-
3.Create a custom external signer for legacy tokens using Controller loop from staging repo (This option will only be available if demanded by Community as part of feedback for Beta graduation).
144
+
1.Turn off the loop (don't support legacy tokens) if external signing is enabled. (recommended to avoid non-expiring tokens)
145
+
2.Let the Controller loop run as it is with static signing keys. Stitch the public keys in external signer's JWKs.
146
+
3.Turn off the loop in kube-controller-manager and create a custom external signer for legacy tokens that obtains them via the external signer.
148
147
149
148
### Risks and Mitigations
150
149
@@ -280,13 +279,12 @@ to implement this enhancement.
280
279
##### Integration tests
281
280
282
281
- Create a cluster with ExternalJWTSigner to configure an external signer and verify TokenRequest and TokenReview APIs work properly.
283
-
284
-
##### e2e tests
285
-
286
-
- Create a cluster with ExternalJWTSigner configured.
287
282
- Request a token for a service account principal.
288
283
- Use a token as bearer for making requests to kube-apiserver and ensure it succeeds.
@@ -296,13 +294,15 @@ to implement this enhancement.
296
294
297
295
#### Beta
298
296
299
-
- E2E tests are completed.
300
-
- We have at least one ExternalSigner implementation working with this change.
297
+
- All tests are completed.
298
+
- We have at least one ExternalSigner integration working with this change.
299
+
- GKE integration is complete
301
300
- Decide whether to externalize legacy token controller code in a staging repo. Check [Support for Legacy Tokens](#support-for-legacy-tokens) for details.
301
+
- Decided not to externalize legacy token controller code
302
302
303
303
#### GA
304
304
305
-
- More than one ExternalSigner implementations are completed.
305
+
- More than one ExternalSigner integration are completed.
306
306
- Feature is tuned with feedback from distributions.
307
307
308
308
### Upgrade/Downgrade Strategy
@@ -425,10 +425,13 @@ No.
425
425
426
426
The Feature would not be used by workload directly but will be used by kube-apiserver.
427
427
428
-
The usage should be visible to the operator using Audit logs.
429
-
<!-- TODO
430
-
Add details on increasing audit log surface area for External signers
431
-
-->
428
+
The usage should be visible to the operator via these metrics:
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
455
467
@@ -468,7 +480,6 @@ Needs Benchmarking on SLIs.
468
480
- It is the integrator's responsibility to ensure that their ExternalJWTSigner implementation support signing tokens with 1 year validity i.e. if their clusters are relying on extended token lifetimes.
469
481
- integrators can observe the `serviceaccount_stale_tokens_total` metric to confirm their cluster's reliance on `--service-account-extend-token-expiration`.
470
482
471
-
472
483
### Dependencies
473
484
474
485
One new dependency will be introduced and it will only be required for clusters configured/opted-in via the `--service-account-signing-endpoint` flag.
@@ -550,35 +561,46 @@ not likely.
550
561
551
562
### Troubleshooting
552
563
553
-
<!-- TODO
554
-
This section must be completed when targeting beta to a release.
555
-
556
-
For GA, this section is required: approvers should be able to confirm the
557
-
previous answers based on experience in the field.
558
-
559
-
The Troubleshooting section currently serves the `Playbook` role. We may consider
560
-
splitting it into a dedicated `Playbook` document (potentially with some monitoring
561
-
details). For now, we leave it here.
562
-
-->
564
+
Symptom: kube-apiserver will not start with `--service-account-signing-endpoint` set
565
+
566
+
- check the kube-apiserver log for details about why startup failed
567
+
- ensure the socket `--service-account-signing-endpoint` points to is valid,
568
+
the kube-apiserver user has permissions to access it, and the external signer is running
569
+
- ensure `--service-account-signing-key-file` and `--service-account-key-file` are not also set
570
+
- ensure the external signer supports the version of the externaljwt gRPC API kube-apiserver is using
571
+
- ensure the maximum supported token lifetime returned by the external signer does not conflict with any
572
+
`--service-account-max-token-expiration` flag (the flag may not be longer than the max expiration supported by the external signer)
573
+
574
+
Symptom: token creation fails with `500` errors
575
+
576
+
- check `apiserver_externaljwt_sign_request_total` metrics for codes other than `OK` to determine if signing failures are the cause
577
+
- if signing requests are failing with `CANCELLED` or `DEADLINE_EXCEEDED` codes,
578
+
check `apiserver_externaljwt_request_duration_seconds` metrics for timing distribution
579
+
of external signing requests with `method=Sign`. If external signing is causing request timeouts,
580
+
investigate improving the performance of your external signer integration.
581
+
- check the kube-apiserver log for details about other signing failures
582
+
583
+
Symptom: token use fails with authentication errors
584
+
585
+
- check the `apiserver_externaljwt_fetch_keys_request_total` metrics for codes other than `OK`
586
+
to determine if verifying keys are failing to be fetched
587
+
- check the `apiserver_externaljwt_fetch_keys_success_timestamp` metric to determine the
588
+
last time public keys were successfully refreshed. If this exceeds the expected `refresh_hint_seconds`
589
+
value for your particular external signer integration, check `kube-apiserver` logs for details on why
590
+
the public key fetch is failing.
591
+
- check the `apiserver_externaljwt_fetch_keys_data_timestamp` metric to determine the `data_timestamp`
592
+
reported by the external signer in the last successful fetch of public keys. Compare to the expected
593
+
value for your particular external signer integration to determine if `kube-apiserver` is using current
594
+
public keys. If this does not match, check your external signer for details on why it is not returning
595
+
the expected public keys to the `FetchKeys` method.
563
596
564
597
###### How does this feature react if the API server and/or etcd is unavailable?
565
598
566
599
feature is only accessible via kube-apiserver. JWT signing and authentication will anyways not work without kube-apiserver.
567
600
568
601
###### What are other known failure modes?
569
602
570
-
<!-- TODO
571
-
For each of them, fill in the following information by copying the below template:
572
-
- [Failure mode brief description]
573
-
- Detection: How can it be detected via metrics? Stated another way:
574
-
How can an operator troubleshoot without logging into a control plane or worker node?
575
-
- Mitigations: What can be done to stop the bleeding, especially for already
576
-
running user workloads?
577
-
- Diagnostics: What are the useful log messages and their required logging
578
-
levels that could help debug the issue?
579
-
Not required until the feature graduated to beta.
580
-
- Testing: Are there any tests for failure mode? If not, describe why.
581
-
-->
603
+
Covered above in the troubleshooting section.
582
604
583
605
###### What steps should be taken if SLOs are not being met to determine the problem?
584
606
@@ -590,6 +612,10 @@ Initial PRs:
590
612
- kubernetes/kubernetes#73110
591
613
- kubernetes/kubernetes#125177
592
614
615
+
1.32: Alpha release
616
+
617
+
1.34: Beta release
618
+
593
619
## Drawbacks
594
620
595
621
Enabling the feature puts a remote service in the critical path of kube-apiserver. Thus, it can easily cause an outage. However, we have some relief in that it is an opt-in/configurable feature.
0 commit comments