You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Items marked with (R) are required *prior to targeting to a milestone / release*.
52
51
53
-
-[] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
54
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
52
+
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
53
+
-[x] (R) KEP approvers have approved the KEP status as `implementable`
55
54
-[x] (R) Design details are appropriately documented
56
-
-[ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
57
-
-[ ] e2e Tests for all Beta API Operations (endpoints)
58
-
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
59
-
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
60
-
-[ ] (R) Graduation criteria is in place
61
-
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
62
-
-[ ] (R) Production readiness review completed
63
-
-[ ] (R) Production readiness review approved
55
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
56
+
-[x] (R) Graduation criteria is in place
57
+
-[x] (R) Production readiness review completed
58
+
-[x] (R) Production readiness review approved
64
59
-[ ] "Implementation History" section is up-to-date for milestone
65
-
-[] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
66
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
60
+
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
61
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -142,9 +137,9 @@ A new versioned grpc API (ExternalJWTSigner) will be created under `k8s.io/kuber
142
137
#### Support for Legacy Tokens
143
138
144
139
Implementers will have following options for legacy token support:
145
-
1.Let the Controller loop run as it is with static signing keys. Stitch the public keys in external signer's JWKs.
146
-
2.Turn off the loop (don't support legacy tokens) if external signing is enabled.
147
-
3.Create a custom external signer for legacy tokens using Controller loop from staging repo (This option will only be available if demanded by Community as part of feedback for Beta graduation).
140
+
1.Turn off the loop (don't support legacy tokens) if external signing is enabled. (recommended to avoid non-expiring tokens)
141
+
2.Let the Controller loop run as it is with static signing keys. Stitch the public keys in external signer's JWKs.
142
+
3.Turn off the loop in kube-controller-manager and create a custom external signer for legacy tokens that obtains them via the external signer.
148
143
149
144
### Risks and Mitigations
150
145
@@ -280,13 +275,12 @@ to implement this enhancement.
280
275
##### Integration tests
281
276
282
277
- Create a cluster with ExternalJWTSigner to configure an external signer and verify TokenRequest and TokenReview APIs work properly.
283
-
284
-
##### e2e tests
285
-
286
-
- Create a cluster with ExternalJWTSigner configured.
287
278
- Request a token for a service account principal.
288
279
- Use a token as bearer for making requests to kube-apiserver and ensure it succeeds.
@@ -296,13 +290,15 @@ to implement this enhancement.
296
290
297
291
#### Beta
298
292
299
-
- E2E tests are completed.
300
-
- We have at least one ExternalSigner implementation working with this change.
293
+
- All tests are completed.
294
+
- We have at least one ExternalSigner integration working with this change.
295
+
- GKE integration is complete
301
296
- Decide whether to externalize legacy token controller code in a staging repo. Check [Support for Legacy Tokens](#support-for-legacy-tokens) for details.
297
+
- Decided not to externalize legacy token controller code
302
298
303
299
#### GA
304
300
305
-
- More than one ExternalSigner implementations are completed.
301
+
- More than one ExternalSigner integration are completed.
306
302
- Feature is tuned with feedback from distributions.
307
303
308
304
### Upgrade/Downgrade Strategy
@@ -425,10 +421,13 @@ No.
425
421
426
422
The Feature would not be used by workload directly but will be used by kube-apiserver.
427
423
428
-
The usage should be visible to the operator using Audit logs.
429
-
<!-- TODO
430
-
Add details on increasing audit log surface area for External signers
431
-
-->
424
+
The usage should be visible to the operator via these metrics:
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
455
463
@@ -468,7 +476,6 @@ Needs Benchmarking on SLIs.
468
476
- It is the integrator's responsibility to ensure that their ExternalJWTSigner implementation support signing tokens with 1 year validity i.e. if their clusters are relying on extended token lifetimes.
469
477
- integrators can observe the `serviceaccount_stale_tokens_total` metric to confirm their cluster's reliance on `--service-account-extend-token-expiration`.
470
478
471
-
472
479
### Dependencies
473
480
474
481
One new dependency will be introduced and it will only be required for clusters configured/opted-in via the `--service-account-signing-endpoint` flag.
@@ -550,35 +557,46 @@ not likely.
550
557
551
558
### Troubleshooting
552
559
553
-
<!-- TODO
554
-
This section must be completed when targeting beta to a release.
555
-
556
-
For GA, this section is required: approvers should be able to confirm the
557
-
previous answers based on experience in the field.
558
-
559
-
The Troubleshooting section currently serves the `Playbook` role. We may consider
560
-
splitting it into a dedicated `Playbook` document (potentially with some monitoring
561
-
details). For now, we leave it here.
562
-
-->
560
+
Symptom: kube-apiserver will not start with `--service-account-signing-endpoint` set
561
+
562
+
- check the kube-apiserver log for details about why startup failed
563
+
- ensure the socket `--service-account-signing-endpoint` points to is valid,
564
+
the kube-apiserver user has permissions to access it, and the external signer is running
565
+
- ensure `--service-account-signing-key-file` and `--service-account-key-file` are not also set
566
+
- ensure the external signer supports the version of the externaljwt gRPC API kube-apiserver is using
567
+
- ensure the maximum supported token lifetime returned by the external signer does not conflict with any
568
+
`--service-account-max-token-expiration` flag (the flag may not be longer than the max expiration supported by the external signer)
569
+
570
+
Symptom: token creation fails with `500` errors
571
+
572
+
- check `apiserver_externaljwt_sign_request_total` metrics for codes other than `OK` to determine if signing failures are the cause
573
+
- if signing requests are failing with `CANCELLED` or `DEADLINE_EXCEEDED` codes,
574
+
check `apiserver_externaljwt_request_duration_seconds` metrics for timing distribution
575
+
of external signing requests with `method=Sign`. If external signing is causing request timeouts,
576
+
investigate improving the performance of your external signer integration.
577
+
- check the kube-apiserver log for details about other signing failures
578
+
579
+
Symptom: token use fails with authentication errors
580
+
581
+
- check the `apiserver_externaljwt_fetch_keys_request_total` metrics for codes other than `OK`
582
+
to determine if verifying keys are failing to be fetched
583
+
- check the `apiserver_externaljwt_fetch_keys_success_timestamp` metric to determine the
584
+
last time public keys were successfully refreshed. If this exceeds the expected `refresh_hint_seconds`
585
+
value for your particular external signer integration, check `kube-apiserver` logs for details on why
586
+
the public key fetch is failing.
587
+
- check the `apiserver_externaljwt_fetch_keys_data_timestamp` metric to determine the `data_timestamp`
588
+
reported by the external signer in the last successful fetch of public keys. Compare to the expected
589
+
value for your particular external signer integration to determine if `kube-apiserver` is using current
590
+
public keys. If this does not match, check your external signer for details on why it is not returning
591
+
the expected public keys to the `FetchKeys` method.
563
592
564
593
###### How does this feature react if the API server and/or etcd is unavailable?
565
594
566
595
feature is only accessible via kube-apiserver. JWT signing and authentication will anyways not work without kube-apiserver.
567
596
568
597
###### What are other known failure modes?
569
598
570
-
<!-- TODO
571
-
For each of them, fill in the following information by copying the below template:
572
-
- [Failure mode brief description]
573
-
- Detection: How can it be detected via metrics? Stated another way:
574
-
How can an operator troubleshoot without logging into a control plane or worker node?
575
-
- Mitigations: What can be done to stop the bleeding, especially for already
576
-
running user workloads?
577
-
- Diagnostics: What are the useful log messages and their required logging
578
-
levels that could help debug the issue?
579
-
Not required until the feature graduated to beta.
580
-
- Testing: Are there any tests for failure mode? If not, describe why.
581
-
-->
599
+
Covered above in the troubleshooting section.
582
600
583
601
###### What steps should be taken if SLOs are not being met to determine the problem?
584
602
@@ -590,6 +608,10 @@ Initial PRs:
590
608
- kubernetes/kubernetes#73110
591
609
- kubernetes/kubernetes#125177
592
610
611
+
1.32: Alpha release
612
+
613
+
1.34: Beta release
614
+
593
615
## Drawbacks
594
616
595
617
Enabling the feature puts a remote service in the critical path of kube-apiserver. Thus, it can easily cause an outage. However, we have some relief in that it is an opt-in/configurable feature.
0 commit comments