CNTRLPLANE-3363: Add KMS plugin health reporter design#2005
Conversation
|
@ibihim: This pull request references CNTRLPLANE-3363 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
- Per-node health reporter sidecar publishes one advisory KMSHealthReporter_<nodeName> condition on the apiserver operator CR. - Aggregator controller reads those conditions and emits a single KMSPluginsDegraded rollup; library-go's StatusSyncer routes the _Degraded suffix into the ClusterOperator's Degraded condition. - Message format: one key=value line per probed plugin (keyID, status, lastChecked, optional trailing detail). - Risks: stale reporter conditions, orphaned conditions on KMS disable, cold-start window.
bb85f9a to
b719627
Compare
|
|
||
| #### Health Reporter Sidecar | ||
|
|
||
| When KMS encryption is enabled, a health reporter sidecar runs alongside every API server pod replica. The sidecar probes the colocated KMS plugin(s) and publishes the outcome to the owning operator's CR as a per-node condition. A separate aggregator controller picks up these conditions and emits a single `KMSPluginsDegraded` rollup, which propagates to the `ClusterOperator`'s `Degraded` condition. |
There was a problem hiding this comment.
I was thinking there will be one health reporter sidecar per each kms plugin (i.e. health reporter a for kms plugin a, health reporter b for kms plugin b, etc.) not single health reporter plugin for all kms plugins. Is that correct?.
There was a problem hiding this comment.
Plugin lifecycle is supposed to inject kms plugins. That means it will additionally inject one health monitor side car?.
There was a problem hiding this comment.
yeah it's a bit contradicting with the sections below. I was also under the assumption we had a 1:1 monitor:plugin relation.
There was a problem hiding this comment.
What
One reporter sidecar per API server pod, not one per plugin. It probes every colocated KMS socket and emits a single per-node condition whose Message carries one line per plugin (see Message format below).
Why
- One monitor per node keeps condition cardinality constant. With one monitor per plugin, it won't (because of SSA / field ownership).
- I'm assuming the injector can run one reporter per pod with all plugin sockets exposed to it, by passing socket paths as flags at startup.
If the above turns out to be harder than 1:1 injection in practice, I will change it.
| - One per `openshift-oauth-apiserver` Deployment replica | ||
| - One per `openshift-apiserver` Deployment replica | ||
|
|
||
| During KMS-to-KMS migration, the same sidecar probes every active KMS plugin in its pod (see [Multiple Concurrent Sidecars](#multiple-concurrent-sidecars)) and reports their combined state in the Message field of its single per-node condition. |
There was a problem hiding this comment.
by checking the current EncryptionConfiguration?
There was a problem hiding this comment.
The health reporter is given the UDS paths as a flag argument. Technically it could also scan for mounted UDS sockets.
| │ SSA (per-node fieldManager) | ||
| ▼ | ||
| operator CR (kubeapiservers.operator.openshift.io/cluster): | ||
| ├─ KMSHealthReporter_<nodeName> ◄─ written by each per-pod reporter |
There was a problem hiding this comment.
Can you give an example about how kms-2.sock and kms-1.sock will be represented in operator CR?
| |---|---|---| | ||
| | All plugins healthy | True | `AsExpected` | | ||
| | Any plugin returns not-ok | False | `Unhealthy` | | ||
| | Any plugin RPC error, no explicit not-ok | Unknown | `Unreachable` | |
There was a problem hiding this comment.
From where the details of RPC error will be seen?
There was a problem hiding this comment.
Every call to the KMS plugin can create an error as per interface:
Status(ctx context.Context) (*StatusResponse, error) // error = RPC error.There was a problem hiding this comment.
How can cluster admin see the RPC error details from the operator conditions reported by health monitor?. cluster admin can not send request to Status endpoint of kms plugin.
There was a problem hiding this comment.
The cluster admin will see the aggregated status created by the condition controller.
Is this a CTA, to add this notion to the EP?
| The Message is the structured input the aggregator controller consumes. One line per probed plugin: | ||
|
|
||
| ``` | ||
| keyID=<id> status=healthy lastChecked=<RFC 3339 timestamp> |
There was a problem hiding this comment.
How can this format be parsed from other controllers (i.e. rotation controller)
There was a problem hiding this comment.
I tried to replicate
message: |-
Available: The registry is ready
NodeCADaemonAvailable: The daemon set node-ca has available replicas
ImagePrunerAvailable: Pruner CronJob has been createdIf you prefer, I can dump json encoded data into it.
There was a problem hiding this comment.
Agreed to use minified JSON
| status: "True" | ||
| reason: AsExpected | ||
| message: | | ||
| keyID=2 status=healthy lastChecked=2026-05-08T12:34:56Z |
There was a problem hiding this comment.
This data probably should be in decodable format instead of one liner. So that it can be usable.
There was a problem hiding this comment.
Agreed to use minified JSON
|
|
||
| ##### SSA mechanism | ||
|
|
||
| Two classes of writers share the conditions array: the operator (writing its own conditions like `NodeControllerDegraded` plus the aggregator's `KMSPluginsDegraded` rollup) and N reporter sidecars (each writing its `KMSHealthReporter_<nodeName>`). Per-entry ownership is enabled by `+listType=map +listMapKey=type` on `OperatorStatus.Conditions`. Each reporter uses a per-node fieldManager (`kms-health-reporter-<nodeName>`); all writers apply with `Force: true`. |
There was a problem hiding this comment.
TBH, I didn't understand anything from this paragraph :). Can we update it better clarify.
There was a problem hiding this comment.
I tried to explain the SSA dynamic. I can drop the section completely.
operator.openshift.io/v1.KubeAPIServer has a status that, where the health reporter would write conditions to. So there is a potential conflict that can be resolved by SSA with +listType=map, but this is a thing we should be cautious of.
|
|
||
| ##### Naming convention | ||
|
|
||
| Each reporter sidecar writes one condition per pod replica to the owning operator's CR (`kubeapiservers.operator.openshift.io/cluster`, etc.), keyed by the node: |
There was a problem hiding this comment.
Could you please mention about who is going to inject this side car.
There was a problem hiding this comment.
https://github.com/ibihim/enhancements/blob/2bb10c537d05893622c0340a2b2322d6d0765abd/enhancements/kube-apiserver/kms-encryption-foundations.md#health-reporter-sidecar now mentions the pluginlifecycle package.
|
|
||
| ##### Destination | ||
|
|
||
| The sidecar writes one advisory condition per pod replica to the owning operator's `*.operator.openshift.io/cluster` CR via Server-Side Apply. The aggregator controller reads these advisory conditions and emits the `KMSPluginsDegraded` rollup. See [KMS Plugin Health Conditions](#kms-plugin-health-conditions) for the exact naming, status mapping, and rollup behavior. |
There was a problem hiding this comment.
Is this "aggregator controller" referring to existing controller or will be implemented?
There was a problem hiding this comment.
Honestly, I was not sure. I would have hoped to let it handle by the conditionController.
There was a problem hiding this comment.
Will there be a change in condition controller?. That means there is a change in encryption controllers. I think we should reflect this what changes are expected?.
There was a problem hiding this comment.
As we discussed, we will see if it works to put it into the condition controller and if not, we will create a new one.
|
|
||
| - `State`: healthy / unhealthy / RPC error | ||
| - `KeyID`: keyID currently active in the probed plugin | ||
| - `Timestamp`: time of this check (last-check time; not the condition's `LastTransitionTime`, which records state flips only) |
There was a problem hiding this comment.
LastTransitionTime, which records state flips only
what does it mean ? does changing the msg counts as changing the state ?
There was a problem hiding this comment.
Timestamp means, whenever the response from the KMS plugin arrived to the health reporter.
It explicitly is not a LTT, based on the suggestion you made earlier.
This means you can derive staleness from now > timestamp + probe interval + skew
|
|
||
| During KMS-to-KMS migration, the encryption-configuration secret contains provider configs for all active keys. The operator creates a separate sidecar for each key, listening on its own unix domain socket (e.g., `kms-1.sock`, `kms-2.sock`). | ||
|
|
||
| #### Health Reporter Sidecar |
There was a problem hiding this comment.
I think that in general we miss:
Specifying the probe intervalSpecifying the creds used to publish the conditionExplaining what happens with orphaned conditionsA plugin identity, when multiple plugins are specified- otherwise we cannot map status to a specific plugin.Listing the alternatives we discussedSpecifying how the reporter connects to the apiserver. Perhaps forKASwe could go vialocalhost
There was a problem hiding this comment.
- Yes, this is worthy to elaborate to balance detection and hammering KAS with SSA request.
- I assumed we would have a SA with RBAC.
- https://github.com/openshift/enhancements/pull/2005/changes#diff-c0e8034f61b2841cee9b2b1d07403961148c3a30f096b0d35720a57ead8d3697R660
- In the status of the condition each health report has its own line. I assumed that the keyID would make it unique. Does it not?
- Ok.
- See 2.
There was a problem hiding this comment.
- Set to 30s.
- legacy SA token with RBAC for now.
- Node + keyID
- oh damn. I will add that.
- service is slightly better, but localhost is worth mentioning.
There was a problem hiding this comment.
Added a section in alternatives to cover the options discussed: b411a09
|
|
||
| Each probe yields a HealthStatus carrying: | ||
|
|
||
| - `State`: healthy / unhealthy / RPC error |
| Each probe yields a HealthStatus carrying: | ||
|
|
||
| - `State`: healthy / unhealthy / RPC error | ||
| - `KeyID`: keyID currently active in the probed plugin |
There was a problem hiding this comment.
I would drop it for now. We don't know how rotation will work yet.
/cc @tjungblu
There was a problem hiding this comment.
no we should include this :) at least that makes my life easier to reference this mechanism in the other enhancement.
There was a problem hiding this comment.
@tjungblu work on the key rotation was the motivation to include it.
There was a problem hiding this comment.
We talked about the possibility of using the mechanism produced by the reporter to publish information about the keys. But we still don’t know the detail, so for now I’d prefer to remove the information about the keyID until then.
I also spoke with @benluddy yesterday, and he would like to add an API instead of using conditions.
We will probably also change the health reporting later, but he agreed that initially we can use conditions for health reporting.
There was a problem hiding this comment.
I think we agreed, to show keyID and kekID for now as minified JSON in the message. Correct me if I am wrong.
| keyID=1 status=healthy lastChecked=2026-05-08T12:34:56Z | ||
| - type: KMSHealthReporter_master-1 | ||
| status: "False" | ||
| reason: Unhealthy |
There was a problem hiding this comment.
not sure if we should change this filed. The msg could be interpreted by the controller
There was a problem hiding this comment.
OK, this is a good point that would need investigation.
| keyID=2 status=healthy lastChecked=2026-05-08T12:34:56Z | ||
| keyID=1 status=healthy lastChecked=2026-05-08T12:34:56Z | ||
| - type: KMSHealthReporter_master-1 | ||
| status: "False" |
There was a problem hiding this comment.
not sure if we should change this filed.
| keyID=<id> status=unhealthy lastChecked=<RFC 3339 timestamp> detail=<healthz.detail> | ||
| ``` | ||
|
|
||
| Each line is a sequence of `key=value` pairs separated by spaces. `detail=...` is the only field whose value may contain spaces; it is always last on its line. `lastChecked` is per-plugin so partial probe failures (one plugin stuck while others probe normally) are visible. |
There was a problem hiding this comment.
is the only field whose value may contain spaces; it is always last on its line
does it mean we need to escape spaces in the detail field ?
There was a problem hiding this comment.
not directly, but as I mentioned to @ardaguclu, I tried to replicate the existing format. I could use JSON if this is preferred.
|
|
||
| ##### Probe contract | ||
|
|
||
| Each sidecar probes its colocated KMS plugin(s) over the local UDS at `unix:///var/run/kmsplugin/kms-{keyID}.sock` (the same socket path scheme described in [Sidecar Injection](#sidecar-injection)). |
There was a problem hiding this comment.
However, this states each sidecar probes its colocated KMS Plugins
There was a problem hiding this comment.
... and is it the keyID though? I would assume we would allocate a unique socket path on each node like:
/var/run/kmsplugin/kms-kube-apiserver/{encryptionconfig.Name (?) }/plugin.sock
not sure whether we need to separate via folders here though
there's also no sidecar-injection section either 😄
There was a problem hiding this comment.
@ardaguclu, is this a question? Each sidecar probes its colocated KMS Plugins. Yes, this is correct.
I added a "Naming caveat" section.
nodeName + keyID should be unique.
There is a sidecar injection section: https://github.com/ibihim/enhancements/blob/2bb10c537d05893622c0340a2b2322d6d0765abd/enhancements/kube-apiserver/kms-encryption-foundations.md#sidecar-injection
| KMSHealthReporter_<nodeName> | ||
| ``` | ||
|
|
||
| The Type has no `_Available` or `_Degraded` suffix, so it stays advisory on the operator CR (library-go's `StatusSyncer` ignores it). The aggregator controller consumes these conditions and emits the `KMSPluginsDegraded` rollup separately (see [Aggregator behavior](#aggregator-behavior)). |
There was a problem hiding this comment.
Why are we trying to skip StatusSyncer?
There was a problem hiding this comment.
Based on a discussion with @p0lyn0mial, a dedicated controller would create a condition that will be read by the StatusSyncer. This will reduce the amount of conditions.
So the "aggregator controller" will create a condition that will be read by the StatusSyncer.
| - One per `openshift-oauth-apiserver` Deployment replica | ||
| - One per `openshift-apiserver` Deployment replica | ||
|
|
||
| During KMS-to-KMS migration, the same sidecar probes every active KMS plugin in its pod (see [Multiple Concurrent Sidecars](#multiple-concurrent-sidecars)) and reports their combined state in the Message field of its single per-node condition. |
There was a problem hiding this comment.
heh there's no multiple-concurrent-sidecars section :)
There was a problem hiding this comment.
There was a problem hiding this comment.
hah, that was two weeks ago. I think you just forgot to push a few commits back then.
|
|
||
| Each probe yields a HealthStatus carrying: | ||
|
|
||
| - `State`: healthy / unhealthy / RPC error |
There was a problem hiding this comment.
what's the difference between RPC error and unhealthy?
There was a problem hiding this comment.
The KMS Plugin can report healthy / unhealthy. An error would indicate that something is wrong with the connection itself.
| reason: Unreachable | ||
| message: | | ||
| keyID=2 status=unreachable lastChecked=2026-05-08T12:34:56Z detail=connection refused | ||
| keyID=1 status=unreachable lastChecked=2026-05-08T12:34:56Z detail=connection refused |
There was a problem hiding this comment.
Errors are usually very long (not short like connection refused). Would this formatting still work for very long error traces?
Reconcile terminology and arguments across the Health Reporter Sidecar and KMS Plugin Health Conditions sections: - Add a naming caveat distinguishing the socket-path keyID (encryption key secret id) from the plugin-reported kekID. - Collapse the duplicate per-tick struct into a single PluginHealthCondition definition; rename KMSPluginID to KeyID. - Drop the stale LastTransitionTime contrast (Status is now hardcoded). - Strengthen the connection rationale with the HA-routing argument and a Single-Node OpenShift caveat. - Acknowledge the legacy SA token lifetime tradeoff.
Address review nits on the health reporter design: - Drop the deep link to AddKMSPluginSidecarToPodSpec; keep the pluginlifecycle package reference (function names and master URLs rot). - Describe the socket-path keyID as a monotonically incrementing sequence number instead of a "revision number", which collides with the kas-o RevisionController concept. - Remove the "advisory" label from the per-node condition: it is misleading since the aggregator does act on it. Describe the StatusSyncer mechanic directly instead. - Move Auth and connection out of the Proposal into a new KMS Health Reporter Connectivity section under Implementation Details. - Rename the unreachable probe status to error, since a failed Status RPC can fail for reasons beyond unreachability. - Replace the GCP-style example kekIDs with short opaque values, decorrelated from keyID to reinforce the naming caveat.
Resolve the open probe-interval question raised in review: - Add a Probe interval subsection under KMS Plugin Health Conditions: fixed 30s default passed as a sidecar flag, n UDS probes per cycle followed by a single SSA write of all results. - Define emission as unconditional and best-effort: the reporter writes every tick so lastChecked keeps advancing, and discards a result it cannot deliver rather than queuing it, since the next interval produces a fresher one. - Derive the aggregator staleness threshold as 4 x probe interval and reference it from the Stale Reporter Conditions risk.
|
|
||
| #### KMS Health Reporter Connectivity | ||
|
|
||
| **Auth**: the reporter uses a **legacy ServiceAccount token** (mounted from a Secret) bound to a minimal Role that only permits applying its single per-node condition entry on the operator CR. The projected SA tokens available in API-server-adjacent namespaces are admin-grade, as are the auth client certificates on disk; both would over-privilege a sidecar whose only job is one SSA apply. The legacy SA token keeps the blast radius minimal if the sidecar is compromised. The tradeoff is lifetime: a legacy token does not expire, whereas a projected token rotates. We accept this, since a token scoped to one verb on one resource is a far smaller prize than an admin-grade token, expiring or not. |
There was a problem hiding this comment.
I agree with the justification here.
|
|
||
| **Risk: Orphaned Conditions on Mode Switch** | ||
| - **Impact:** When KMS is disabled (e.g., switching to `aescbc`), reporter sidecars are removed. Without explicit cleanup, `KMSHealthReporter_<nodeName>` and `KMSPluginsDegraded` entries remain stale on the operator CR. | ||
| - **Mitigation:** The aggregator controller owns cleanup. It removes orphaned `KMSHealthReporter_<nodeName>` entries (when their owning sidecar is no longer present) and removes its own `KMSPluginsDegraded` entry on KMS disable. |
There was a problem hiding this comment.
In SSA, will there be a fight between health reporter that tries to update the condition and aggregator controller that tries to cleanup the condition?.
There was a problem hiding this comment.
I don't think it's going to be a big problem when they stomp on each other, we might have some flapping during non-ready/deleted states though.
I wonder if we rather have top level condition of the KMSHealthReporter_<nodeName> owned by the aggregator controller.
Kinda similar to how the node controller works:
https://github.com/openshift/library-go/blob/3a6f949c22c3ffc0a2b28ee50e78173086e7adae/pkg/operator/staticpod/controller/node/node_controller.go
the sidecar would only ever write to it when the condition for said node exists. Sounds like too much coordination for a few conflicts between node state transitions...
for the later API, we could just use a map and key it off the nodename directly https://github.com/openshift/api/pull/2850/changes#diff-616d67895c3421c2d091662d30ed47b4b6f0b57db9411e29618d22b964ddb9efR362
There was a problem hiding this comment.
You know better than me those conditions. In SSA, it is always better a controller fully owns the field.
There was a problem hiding this comment.
The *Degraded, *Available and so on are owned by the aggregator controller.
The aggregator controller itself would remove KMSHealthReporter_ that are owned by the KMS health reporters, not write into them. So there is some flapping, but it shouldn't be for too long / it shouldn't be a problem. Because the removal happens during garbage collection.
We could tear down the health reporter down before we tear down the KMS plugin, so we don't have wrong error conditions.
|
I have just one question. Other than that changes look good to me. |
| - **Impact:** A reporter that hangs leaves its last `KMSHealthReporter_<nodeName>` condition in etcd unchanged. | ||
| - **Mitigation:** Per-plugin `lastChecked` timestamps in Message expose staleness. The aggregator controller treats a condition whose `lastChecked` exceeds the freshness threshold (`4 × probe interval`; see [Probe interval](#probe-interval)) as effectively `Unknown`. | ||
|
|
||
| **Risk: Orphaned Conditions on Mode Switch** |
There was a problem hiding this comment.
| **Risk: Orphaned Conditions on Mode Switch** | |
| **Risk: Orphaned Conditions on Mode Switch** | |
| **Risk: Orphaned Conditions on encryption type change and node replacement** |
Just wanted to ask who is cleaning up the old orphaned nodes when a node goes away, seems it is just buried in the Mitigation here :) Let's make the title clearer.
There was a problem hiding this comment.
Encryption type change is more accurate, but "node replacement", is this something that is a major concern? I mean we can add it, it is correct.
There was a problem hiding this comment.
we can scale the control plane up and down anytime, so yes, this is kinda important
There was a problem hiding this comment.
I will change it. Can't apply your commit as it contains the previous line as well.
|
@ibihim: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What
A health reporter sidecar runs alongside every API server pod replica when KMS is enabled. It probes the colocated KMS plugin(s) and writes a single advisory
KMSHealthReporter_<nodeName>condition per node on the apiserver operator CR.Why
Exposes plugin health state through the operator CRs and onward into the
ClusterOperator'sDegradedcondition, so a misbehaving KMS plugin is visible inoc get corather than silently waiting until KAS encryption fails.Supports future key rotation: per-plugin
keyIDin the reporter's Message lets a rotation controller verify all nodes agree on the active key before initiating rotation.