CNTRLPLANE-3524: feat: add etcd data re-encryption after encryption key rotation#8219
CNTRLPLANE-3524: feat: add etcd data re-encryption after encryption key rotation#8219muraee wants to merge 9 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds etcd data encryption status tracking and key rotation lifecycle management to HyperShift. It introduces a new 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
d5e5097 to
0d54a40
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 98-111: The code is writing the active key fingerprint and
unconditionally setting rekey-needed when the annotation was previously absent;
change the logic in the block that uses computeActiveKeyFingerprint,
EncryptionActiveKeyHashAnnotation and EncryptionRekeyNeededAnnotation so that
you always persist the fingerprint into secret.Annotations when fingerprint !=
"", but only set secret.Annotations[EncryptionRekeyNeededAnnotation] = "true" if
an existingHash was present (existingHash != "") and different from the new
fingerprint; do not mark rekey-needed when writing the hash for the first time.
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the finalization so the HCP status is updated to
show re-encryption completed before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and persist that status update (as
done in the parent Reconcile flow) first, and only after the status update
succeeds call r.removeRekeyAnnotation(ctx, encryptionSecret); this ensures that
if the status update fails the reconcile will retry (via Reconcile's rekeyNeeded
logic) and that a later annotation-removal failure won't leave the condition
stuck in InProgress.
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current logic only sets EtcdDataEncryptionUpToDate on
hcluster when the HCP condition exists, which leaves a stale condition when
encryption is disabled, hcp is nil, or the HCP removes the condition; update the
block around hcp and hcluster.Spec.SecretEncryption to handle the
missing/disabled case by removing the bubbled condition instead of doing
nothing: check if hcp is nil or hcluster.Spec.SecretEncryption is nil or
meta.FindStatusCondition(hcp.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) returns nil, and in those cases call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)); otherwise (when cond exists)
continue to set cond.ObservedGeneration = hcluster.Generation and
meta.SetStatusCondition as currently done.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 3f62b72e-be23-40c7-8620-e6a59a833dde
⛔ Files ignored due to path filters (18)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (8)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.gocontrol-plane-operator/hostedclusterconfigoperator/api/scheme.gocontrol-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
There was a problem hiding this comment.
♻️ Duplicate comments (2)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go (1)
103-111:⚠️ Potential issue | 🟠 MajorDon't mark re-encryption needed on the first hash write.
When
EncryptionActiveKeyHashAnnotationis absent (empty string), the conditionexistingHash != fingerprintevaluates totrue, which setsencryption-rekey-needed=true. This will spuriously trigger re-encryption migrations for new clusters and for existing clusters upgrading to this version.The hash should always be persisted, but
rekey-neededshould only be set when a previous hash existed and differs from the new one.Suggested fix
if fingerprint != "" { if secret.Annotations == nil { secret.Annotations = map[string]string{} } existingHash := secret.Annotations[EncryptionActiveKeyHashAnnotation] - if existingHash != fingerprint { - secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint - secret.Annotations[EncryptionRekeyNeededAnnotation] = "true" + secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint + if existingHash != "" && existingHash != fingerprint { + secret.Annotations[EncryptionRekeyNeededAnnotation] = "true" } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go` around lines 103 - 111, The current logic in secrtencryption.go sets EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there was no previous hash, causing spurious re-encryption; update the block around secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an existing non-empty hash existed and is different (i.e., existingHash != "" && existingHash != fingerprint). Keep the nil-check for secret.Annotations and ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in all cases.control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)
245-252:⚠️ Potential issue | 🟡 MinorConsider ordering: update status before removing annotation.
If the annotation removal at line 247 succeeds but the subsequent HCP status update in the parent
Reconcile(line 138) fails, the next reconcile will findrekeyNeeded != "true"(line 167-168) and return early. This leaves the condition potentially stuck in its previous state (e.g.,InProgress) rather thanCompleted.To ensure consistency, consider splitting this into two reconcile cycles or restructuring to ensure status is persisted before annotation removal is committed.
Alternative approach: set condition before annotation removal
// All migrations succeeded — remove the rekey-needed annotation and set condition True. log.Info("All re-encryption migrations completed successfully") + setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason, + "All etcd data has been re-encrypted with the active encryption key") + + // Note: If annotation removal fails after status update succeeds in parent Reconcile, + // the next reconcile will re-process and retry annotation removal (migrations already done). if err := r.removeRekeyAnnotation(ctx, encryptionSecret); err != nil { return reconcile.Result{}, fmt.Errorf("failed to remove rekey-needed annotation: %w", err) } - setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason, - "All etcd data has been re-encrypted with the active encryption key") return reconcile.Result{}, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go` around lines 245 - 252, Reorder the success path so the HCP status is updated to ReEncryptionCompleted before removing the rekey-needed annotation: call setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is persisted (via the same status update code used in Reconcile) prior to calling removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail, return the error and only remove the annotation after status update succeeds, ensuring the Reconcile loop sees the completed condition even if annotation removal later fails; references: setReEncryptionCondition, removeRekeyAnnotation, Reconcile, rekeyNeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 103-111: The current logic in secrtencryption.go sets
EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there
was no previous hash, causing spurious re-encryption; update the block around
secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist
the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an
existing non-empty hash existed and is different (i.e., existingHash != "" &&
existingHash != fingerprint). Keep the nil-check for secret.Annotations and
ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in
all cases.
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the success path so the HCP status is updated to
ReEncryptionCompleted before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is
persisted (via the same status update code used in Reconcile) prior to calling
removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail,
return the error and only remove the annotation after status update succeeds,
ensuring the Reconcile loop sees the completed condition even if annotation
removal later fails; references: setReEncryptionCondition,
removeRekeyAnnotation, Reconcile, rekeyNeeded.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 969834bc-33b1-4893-9f05-17b72afb37c2
⛔ Files ignored due to path filters (18)
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.gois excluded by!vendor/**,!**/vendor/**vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (8)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.gocontrol-plane-operator/hostedclusterconfigoperator/api/scheme.gocontrol-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
- control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- control-plane-operator/hostedclusterconfigoperator/cmd.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
- hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
- api/hypershift/v1beta1/hostedcluster_conditions.go
0d54a40 to
e29501b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)
245-251:⚠️ Potential issue | 🟠 MajorPersist
ReEncryptionCompletedbefore clearingencryption-rekey-needed.
removeRekeyAnnotation()runs before the outerReconcile()call writeshcp.Status. If the secret update succeeds but that later status update fails, the next pass seesrekey-neededabsent and exits early, leavingEtcdDataEncryptionUpToDatestuck at its previous value. Move annotation removal to a second phase after the completed condition has been successfully persisted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go` around lines 245 - 251, The code currently removes the encryption-rekey-needed annotation (removeRekeyAnnotation) before the controller persists the updated HCP status (setReEncryptionCondition), which can cause the requeue logic to skip remaining work if the status update later fails; change the order so you first set and persist the ReEncryptionCompleted condition on the HostedControlPlane (call setReEncryptionCondition and ensure the status update is saved via the status writer used in Reconcile), verify that the status update succeeded, and only then call removeRekeyAnnotation to delete the encryption-rekey-needed annotation; use the existing functions removeRekeyAnnotation, setReEncryptionCondition and the same status update flow in Reconcile to implement this two-phase commit.hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)
897-904:⚠️ Potential issue | 🟠 MajorRemove the bubbled condition when HCP no longer reports it.
This only copies
EtcdDataEncryptionUpToDatewhen the HCP currently has it. If encryption is disabled, the HCP is gone, or the reencryption controller removes the HCP condition, the old HostedCluster value sticks around and becomes stale.Suggested fix
- // Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured. - if hcp != nil && hcluster.Spec.SecretEncryption != nil { - cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)) - if cond != nil { - cond.ObservedGeneration = hcluster.Generation - meta.SetStatusCondition(&hcluster.Status.Conditions, *cond) - } - } + // Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured. + if hcluster.Spec.SecretEncryption == nil || hcp == nil { + meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)) + } else if cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)); cond != nil { + cond.ObservedGeneration = hcluster.Generation + meta.SetStatusCondition(&hcluster.Status.Conditions, *cond) + } else { + meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go` around lines 897 - 904, The current bubble-up only sets EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when HCP no longer reports it; update the block in hostedcluster_controller.go that references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is nil, or meta.FindStatusCondition(...) returns nil, call meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition; otherwise keep the existing behavior of updating ObservedGeneration and calling meta.SetStatusCondition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 127-130: The Get calls that fetch the HostedControlPlane (hcp :=
resourcemanifests.HostedControlPlane(...) followed by r.cpClient.Get(...)) and
the similar Gets for the encryption config Secret and KAS Deployment should
treat apierrors.IsNotFound(err) as a transient condition instead of returning a
hard error; update the error handling in the r.cpClient.Get(...) blocks
(including the HCP fetch and the other Get calls around 158-176) to check
apierrors.IsNotFound(err) and return a non-error reconcile result (e.g.,
reconcile.Result{} or a short requeue with nil error) so the controller
idles/publishes waiting state instead of entering error backoff.
- Around line 165-166: The code uses
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation] (and the
same pattern at the other occurrence around lines ~200–201) as the writeKey for
EnsureMigration which is a deterministic key fingerprint and can collide when
rotating back to a previously used key; change this to a monotonic rotation
token annotation (e.g., kas.EncryptionRotationTokenAnnotation) that is
incremented (or replaced with a monotonic nonce) each time a rotation is
initiated and persisted on encryptionSecret, and update the code that reads the
annotation (the places referencing
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to read the
rotation token instead and pass that token as the writeKey into EnsureMigration
so each rotation event is unique even if the underlying key value repeats.
---
Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-251: The code currently removes the encryption-rekey-needed
annotation (removeRekeyAnnotation) before the controller persists the updated
HCP status (setReEncryptionCondition), which can cause the requeue logic to skip
remaining work if the status update later fails; change the order so you first
set and persist the ReEncryptionCompleted condition on the HostedControlPlane
(call setReEncryptionCondition and ensure the status update is saved via the
status writer used in Reconcile), verify that the status update succeeded, and
only then call removeRekeyAnnotation to delete the encryption-rekey-needed
annotation; use the existing functions removeRekeyAnnotation,
setReEncryptionCondition and the same status update flow in Reconcile to
implement this two-phase commit.
In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current bubble-up only sets
EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when
HCP no longer reports it; update the block in hostedcluster_controller.go that
references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition
to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is
nil, or meta.FindStatusCondition(...) returns nil, call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition;
otherwise keep the existing behavior of updating ObservedGeneration and calling
meta.SetStatusCondition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: fe17a0ce-bc5e-4d0c-87ad-e9027725f712
⛔ Files ignored due to path filters (4)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yamlis excluded by!**/testdata/**docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (9)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.gocontrol-plane-operator/hostedclusterconfigoperator/api/scheme.gocontrol-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.gogo.modhypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
- control-plane-operator/hostedclusterconfigoperator/api/scheme.go
🚧 Files skipped from review as they are similar to previous changes (4)
- control-plane-operator/hostedclusterconfigoperator/cmd.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
- api/hypershift/v1beta1/hostedcluster_conditions.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
e29501b to
e1b396f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (2)
254-263:⚠️ Potential issue | 🟡 MinorCompleted status can still be lost on the failure path.
Line 261 persists annotation removal before the caller persists the updated HCP status. If
removeRekeyAnnotation()succeeds and the laterStatus().Update()fails, the next reconcile hitsrekeyNeeded != "true"and exits early, leavingEtcdDataEncryptionUpToDatestale instead ofReEncryptionCompleted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go` around lines 254 - 263, The code removes the rekey-needed annotation before the updated HCP status is persisted, risking loss of the Completed condition if the subsequent status update fails; change the order so you persist the updated HCP status with the ReEncryptionCompleted condition (call the HCP status update via r.Client.Status().Update or the existing status update helper) first, check that it succeeded, and only then call removeRekeyAnnotation(encryptionSecret); reference setReEncryptionCondition, the HCP Status().Update call, and removeRekeyAnnotation to locate and reorder the operations.
169-171:⚠️ Potential issue | 🟠 MajorUse a rotation token here, not the active-key fingerprint.
activeKeyHashidentifies the current key, not the rotation event. If a cluster rotates A → B → A, this passes the samewriteKeyback intoEnsureMigration, so an older completed migration for key A can satisfy the check even though the latest rotation still needs objects rewritten from B to A.Also applies to: 208-210
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go` around lines 169 - 171, The code is using the active-key fingerprint (encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive EnsureMigration, which breaks for A→B→A rotations; instead read and use the rotation token annotation (e.g., encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the writeKey passed to EnsureMigration so each rotation event is unique; update the usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate occurrence around the EnsureMigration call (the lines referencing kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable) to use the rotation-token annotation and pass that token through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 188-199: The fingerprinting currently includes the Secret name
(secretEncryption.AESCBC.ActiveKey.Name) which causes renames/copies to change
the fingerprint; change the input construction in the AESCBC case to only
fingerprint the key bytes: compute the SHA256 of
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] and set input to include only
the algorithm and the hash (e.g. "aescbc/<sha256>"), removing any use of
ActiveKey.Name; ensure you reference activeKeySecret and
hyperv1.AESCBCKeySecretKey when locating the key bytes and preserve existing
error handling around cpContext.Client.Get.
---
Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 254-263: The code removes the rekey-needed annotation before the
updated HCP status is persisted, risking loss of the Completed condition if the
subsequent status update fails; change the order so you persist the updated HCP
status with the ReEncryptionCompleted condition (call the HCP status update via
r.Client.Status().Update or the existing status update helper) first, check that
it succeeded, and only then call removeRekeyAnnotation(encryptionSecret);
reference setReEncryptionCondition, the HCP Status().Update call, and
removeRekeyAnnotation to locate and reorder the operations.
- Around line 169-171: The code is using the active-key fingerprint
(encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive
EnsureMigration, which breaks for A→B→A rotations; instead read and use the
rotation token annotation (e.g.,
encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the
writeKey passed to EnsureMigration so each rotation event is unique; update the
usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate
occurrence around the EnsureMigration call (the lines referencing
kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable)
to use the rotation-token annotation and pass that token through.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: b08bffff-7931-4567-990b-d445f0cabf23
⛔ Files ignored due to path filters (4)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yamlis excluded by!**/testdata/**docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (9)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.gocontrol-plane-operator/hostedclusterconfigoperator/api/scheme.gocontrol-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.gogo.modhypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
- control-plane-operator/hostedclusterconfigoperator/api/scheme.go
- go.mod
- control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
- api/hypershift/v1beta1/hostedcluster_conditions.go
e1b396f to
d9f1f6a
Compare
|
I now have all the information needed for a complete root cause analysis. Here is the report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorRoot CauseThe first commit in the PR (`fab20778d3`) has a title that **violates the Conventional Commits specification** enforced by the repository's `.gitlint` configuration.The commit title is: The The prefix The second commit ( RecommendationsThese are some recommendations based on the failure analysis1. **Reword the first commit's title** to use an allowed Conventional Commits prefix. The most appropriate prefix for a vendoring commit is `chore` or `build`: ``` chore: vendor library-go migrators and kube-storage-version-migrator informer/lister ``` or: ``` build: vendor library-go migrators and kube-storage-version-migrator informer/lister ```
EvidenceThis is some collected evidence that helped reach the mentioned root cause
|
d9f1f6a to
968dac4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)
135-136:⚠️ Potential issue | 🟡 MinorHandle deleted
HostedControlPlaneas a no-op.If the watched HCP is already gone, this returns an error and the controller backs off during normal teardown. Treat
apierrors.IsNotFound(err)here the same way as the Secret/Deployment lookups and just exit cleanly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go` around lines 135 - 136, When fetching the HostedControlPlane (hcp) with r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign delete and exit cleanly instead of returning an error; update the error handling around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case, while preserving the existing fmt.Errorf path for other errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 185-200: The code treats a missing
EncryptionMigratedKeyHashAnnotation as "first deployment" and initializes it to
activeKeyHash (via setMigratedKeyHash), which incorrectly skips re-encryption
when kas-secret-encryption-config is recreated during a pending rotation; modify
the logic in the reconcile path that reads EncryptionMigratedKeyHashAnnotation
and compares it to activeKeyHash so that both the "initialize migrated key hash"
branch and the migratedKeyHash == activeKeyHash fast-path are only taken when
the rekey-needed signal (hypershift.openshift.io/encryption-rekey-needed) is not
present/true; i.e., check the hypershift.openshift.io/encryption-rekey-needed
annotation before calling setMigratedKeyHash or returning early on equal hashes
so pending rekey flows continue to perform re-encryption.
- Around line 363-367: The code uses
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] without validating it and then
hashes potentially empty bytes; update the logic around the c.Get(...) result in
reencryption.go to check that activeKeySecret.Data contains a non-empty entry
for hyperv1.AESCBCKeySecretKey before computing input (the fingerprint), and if
the key is missing or empty return a clear error (or handle it upstream) instead
of generating a fingerprint for empty bytes; adjust any callers that rely on
input/EncryptionMigratedKeyHashAnnotation to expect this error path.
---
Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 135-136: When fetching the HostedControlPlane (hcp) with
r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign
delete and exit cleanly instead of returning an error; update the error handling
around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check
apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case,
while preserving the existing fmt.Errorf path for other errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 68ffe0d4-c58d-4278-b70b-957f7c8b917b
⛔ Files ignored due to path filters (4)
control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yamlis excluded by!**/testdata/**docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.mdvendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.gois excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (8)
api/hypershift/v1beta1/hostedcluster_conditions.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.gocontrol-plane-operator/hostedclusterconfigoperator/api/scheme.gocontrol-plane-operator/hostedclusterconfigoperator/cmd.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.gogo.modhypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
- control-plane-operator/hostedclusterconfigoperator/api/scheme.go
- hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
|
Re: @coderabbitai's comment on AESCBC key validation: Done. Added validation to return empty fingerprint when key data is missing or empty. AI-assisted response via Claude Code |
|
Re: @coderabbitai's comment on missing migrated-hash during rotation: The The scenario described (secret recreated during pending rotation) requires manual deletion of an operator-managed secret, which would also disrupt KAS itself. In that edge case, the controller initializes the migrated hash to the current active key, which is correct — the CPO will recreate the secret with the current encryption config, and KAS will use the current key going forward. If re-encryption for the previous key was incomplete, the next key rotation will trigger a full re-encryption. AI-assisted response via Claude Code |
968dac4 to
2b0d02c
Compare
|
Tip For best results, initiate chat on the files or code changes.
|
670e29c to
2cd3d4e
Compare
2cd3d4e to
3526885
Compare
…oDate condition Add API types for tracking etcd data re-encryption after key rotation: - SecretEncryptionStatus with ActiveKey, TargetKey, and History fields - SecretEncryptionKeyStatus union type (Azure, AWS, IBMCloud, AESCBC) - Per-provider key status types (AzureKMSKeyStatus, AWSKMSKeyStatus, etc.) - EncryptionMigrationHistory and EncryptionMigrationState types - EtcdDataEncryptionUpToDate condition with 7 reason constants - SecretEncryption field added to HostedClusterStatus and HCPStatus - Deprecated backupKey fields on AWSKMSSpec, AzureKMSSpec, AESCBCSpec Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
Auto-generated output from make update after API type changes. Includes vendored library-go migrators and kube-storage-version-migrator informer/lister packages required by the HCCO re-encryption controller. Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
…-up, and rotation guard VAP - Add support/secretencryption/ package with fingerprint computation, key status construction, and EncryptionConfiguration parsing - Copy EtcdDataEncryptionUpToDate condition from HCP to HostedCluster using the copy-if-present pattern - Copy SecretEncryption status from HCP to HostedCluster - Deploy ValidatingAdmissionPolicy that blocks active key changes while re-encryption is in progress (EtcdDataEncryptionUpToDate=False) Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
…out and deploy migrator Two-stage KAS encryption key rollout: - Rewrite adaptSecretEncryptionConfig to derive write/read key assignment from HCP status and live EncryptionConfiguration - ReadOnlyDeploy: old key writes, new key read-only - WritePromote: new key writes once in config and KAS converged - Extract AWSKMSProviderName, AzureKMSProviderName, AESCBCKeyName as shared functions for provider name computation - Add --encryption-provider-config-automatic-reload=false for Azure KMS Control plane kube-storage-version-migrator: - New CPOv2 component with platform predicate (skip IBM/PowerVS) - Add data-plane migrator deployments to CVO resourcesToRemove Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
Add re-encryption controller to the HCCO that drives the key rotation lifecycle by deriving the current phase from observable state: - Detect key changes by comparing spec vs status fingerprints - Derive phase from EncryptionConfiguration contents and KAS convergence - Create StorageVersionMigration CRs via KubeStorageVersionMigrator - Manage targetKey, activeKey, and history in HCP status - Set EtcdDataEncryptionUpToDate condition with phase-specific reasons - Expose Prometheus metrics for rotation state, duration, and failures Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
3526885 to
f5ba6bd
Compare
Signed-off-by: Mulham Raee <mulham.raee@gmail.com> Commit-Message-Assisted-by: Claude (via Claude Code)
|
I have all the evidence. The Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe PR adds a new struct // keyVersion is the key version number.
// +required
// +kubebuilder:validation:Minimum=0
KeyVersion int32 `json:"keyVersion,omitempty"`The The fix is to change the field type from Recommendations
Evidence
|
Replace the duplicated *KeyStatus types (AWSKMSKeyStatus, AzureKMSKeyStatus, IBMCloudKMSKeyStatus) with the existing spec types (AWSKMSKeyEntry, AzureKMSKey, IBMCloudKMSKeyEntry) in SecretEncryptionKeyStatus. This follows the Kubernetes convention of reusing spec types in status when the data shape is the same. Also adds omitempty to the inner fields of the three spec types, fixing a pre-existing API convention violation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…omitzero Use metav1.Time (non-pointer) with omitzero instead of *metav1.Time with omitempty. The IsZero() method on metav1.Time makes omitzero correctly omit the field when unset, removing the need for a pointer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@muraee: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@muraee: This pull request references CNTRLPLANE-3524 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
StorageVersionMigrationCRs to re-encrypt all existing etcd data with the new active keyEtcdDataEncryptionUpToDatecondition on HCP/HostedClusterKubeStorageVersionMigratorand kube-storage-version-migrator informer/lister packagesDetails
Problem
HyperShift supports encryption key rotation (setting a new active key + backup key), but has no mechanism to re-encrypt existing etcd data with the new key. Old data remains encrypted with the previous key indefinitely, violating compliance requirements (OCPSTRAT-2527, OCPSTRAT-2540).
Solution
Key change detection is performed entirely within the HCCO re-encryption controller — no CPO changes needed. The controller computes a SHA-256 fingerprint of the active key identity from the HCP spec on each reconcile and compares it against a
hypershift.openshift.io/encryption-migrated-key-hashannotation on thekas-secret-encryption-configsecret. When the hashes differ, re-encryption is triggered.API (
api/hypershift/v1beta1/hostedcluster_conditions.go):EtcdDataEncryptionUpToDatecondition typeReEncryptionInProgress,ReEncryptionCompleted,ReEncryptionFailed,WaitingForKASConvergenceHCCO (
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/):reencryptioncontroller using library-go'sKubeStorageVersionMigratorcomputeActiveKeyFingerprint()— SHA-256 hash of active key identity (supports AWS KMS, Azure KMS, IBM Cloud KMS, AESCBC)StorageVersionMigrationCRs in the guest cluster for all encrypted resources (secrets, configmaps, routes, oauth tokens)PruneMigrationEtcdDataEncryptionUpToDatecondition on HCP with detailed progress messagesHyperShift Operator (
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go):EtcdDataEncryptionUpToDatecondition from HCP to HostedCluster (only when encryption is configured and condition exists)Vendoring
library-go/pkg/operator/encryption/controllers/migrators/—KubeStorageVersionMigrator,Migratorinterfacekube-storage-version-migrator/pkg/clients/informer/andlister/— SVM informer/listerRef: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969
Test plan
computeActiveKeyFingerprint()— all provider types, determinism, different keys produce different hashes, unsupported type returns errorgo build ./...passesgo test -racepasses for all modified packages🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores