diff --git a/enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md b/enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
new file mode 100644
index 0000000000..897179ff71
--- /dev/null
+++ b/enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
@@ -0,0 +1,2080 @@
+---
+title: etcd-data-reencryption-on-key-rotation
+authors:
+ - "@muraee"
+reviewers:
+ - "@csrwng, for HyperShift architecture, please review
+ the CPO/HCCO responsibility split and controller
+ lifecycle"
+ - "@enxebre, for HyperShift architecture, please review
+ the overall design and API condition semantics"
+ - "@sjenning, for HyperShift architecture, please review
+ the KAS convergence and guest cluster client pattern"
+ - "TBD, for ARO-HCP team, please review Azure KMS key
+ rotation flow and S360 compliance aspects"
+ - "TBD, for library-go encryption expertise, please
+ review KubeStorageVersionMigrator usage and safety
+ invariants"
+approvers:
+ - "@csrwng"
+api-approvers:
+ - None
+creation-date: 2026-04-09
+last-updated: 2026-04-21
+tracking-link:
+ - https://redhat.atlassian.net/browse/OCPSTRAT-2527
+ - https://redhat.atlassian.net/browse/OCPSTRAT-2540
+see-also:
+ - https://redhat.atlassian.net/browse/ARO-21568
+ - https://redhat.atlassian.net/browse/ARO-21456
+replaces:
+ - N/A
+superseded-by:
+ - N/A
+---
+
+# etcd Data Re-encryption for Key Rotation
+
+## Summary
+
+HyperShift supports encryption key rotation infrastructure -- a new
+active key can be set in `SecretEncryptionSpec`, and the
+`EncryptionConfiguration` is correctly generated with the new key as
+the write provider and the old key as a read provider. However,
+there is no mechanism to re-encrypt existing etcd data with the new
+key after rotation, and the current `backupKey` API requires manual
+lifecycle management that is error-prone. This enhancement adds a
+re-encryption controller in the HCCO that observes the encryption
+state (spec vs status keys, EncryptionConfiguration contents, KAS
+convergence, SVM completion), derives the current rotation phase,
+and acts when conditions are met — setting `targetKey`, creating
+StorageVersionMigration CRs, and updating `activeKey` on
+completion. The CPO's `adaptSecretEncryptionConfig()` independently
+derives the correct EncryptionConfiguration from the same
+observable state, implementing the two-stage KAS rollout (first
+adding the new key as read-only, then promoting it to write). The
+enhancement also deploys the `kube-storage-version-migrator` in
+the control plane to support zero-worker-node clusters, introduces
+`status.secretEncryption` fields on HostedCluster/HCP to track
+the active key, target key, and rotation history, deprecates the
+`backupKey` spec fields, and tracks progress via a new
+`EtcdDataEncryptionUpToDate` condition.
+
+## Motivation
+
+Without re-encryption, old etcd data remains encrypted with the
+previous key indefinitely after a key rotation. This is unacceptable
+for ARO-HCP's S360 compliance requirements and for any customer
+relying on key rotation as a security control.
+
+The current gaps are:
+
+1. There is **no mechanism to trigger re-encryption** of existing
+ etcd data with the new key.
+2. There is **no way to track progress or confirm completion** of
+ re-encryption.
+3. The **`backupKey` API is error-prone** -- it requires the user
+ to manually manage old key references and risks premature
+ removal that could leave data unreadable.
+4. The **`kube-storage-version-migrator` runs in the data plane**,
+ which prevents re-encryption on clusters with zero worker
+ nodes.
+
+### User Stories
+
+#### Story 1: ARO-HCP Key Rotation Compliance
+
+As an ARO-HCP platform operator, I want all etcd data to be
+automatically re-encrypted with the new key after a key rotation, so
+that our clusters meet Microsoft's S360 security requirements for
+complete data coverage under the active key.
+
+#### Story 2: Key Rotation Progress Monitoring
+
+As a cluster administrator, I want to monitor the progress and
+completion of etcd data re-encryption through a standard Kubernetes
+condition on the HostedCluster, so that I can confirm when it is safe
+to deactivate or remove the old encryption key.
+
+#### Story 3: Safe Key Rotation Without Manual Lifecycle Management
+
+As a cluster administrator performing a key rotation, I want the
+system to automatically track the previous active key and manage
+the rotation lifecycle, so that I only need to update the active
+key in the spec without manually managing a backup key field.
+
+#### Story 4: Operations Team Incident Response
+
+As an operations team member, I want to know when a re-encryption
+has failed and see actionable error details in the HostedCluster
+conditions, so that I can diagnose and remediate issues without
+inspecting individual resources in the guest cluster.
+
+### Goals
+
+1. Guarantee that all existing encrypted etcd data is
+ re-encrypted with the currently active encryption key after
+ a key rotation. The re-encryption controller creates
+ `StorageVersionMigration` CRs for whatever resource types
+ `KMSEncryptedObjects()` returns (for KMS) or for `secrets`
+ (for AESCBC). Note: KMS sidecars are currently only
+ configured on KAS, so only KAS-served resources (secrets,
+ configmaps) are actually encrypted via KMS today. The
+ remaining resources in `KMSEncryptedObjects()` (routes,
+ oauthaccesstokens, oauthauthorizetokens) are served by
+ the OpenShift API servers, which do not have KMS sidecars
+ — fixing this is out of scope (see Non-Goals). Similarly,
+ AESCBC only encrypting secrets is a known gap in the
+ existing implementation. In both cases, the re-encryption
+ controller will automatically cover additional resources
+ when the upstream encryption scope is expanded.
+
+2. Provide an `EtcdDataEncryptionUpToDate` condition on
+ HostedControlPlane and HostedCluster that tracks re-encryption
+ progress and completion.
+
+3. Track the active encryption key in HostedCluster/HCP status,
+ enabling automatic rotation detection and eliminating the
+ need for manual `backupKey` management.
+
+4. Maintain cluster availability during the re-encryption process
+ via a two-stage KAS rollout that prevents decryption failures
+ during rolling updates.
+
+5. Support all encryption types (Azure KMS, AWS KMS, IBM Cloud
+ KMS, AESCBC) with a single, generic re-encryption mechanism.
+
+6. Expose Prometheus metrics for re-encryption state and failures
+ to enable alerting without requiring active polling of
+ conditions.
+
+7. Maintain a brief history of recent key rotations in
+ HostedCluster/HCP status for audit and debugging.
+
+### Non-Goals
+
+1. Management of the creation and renewal of encryption keys --
+ keys are managed externally (by the ARO RP or user).
+
+2. Automatic key rotation scheduling or detection of
+ cloud-provider-initiated rotation where the key identifier
+ is unchanged (e.g., AWS KMS automatic rotation behind the
+ same ARN). See "Cloud-Provider-Initiated Key Rotation" in
+ Implementation Details for per-provider analysis.
+
+3. Performance tuning for specific cluster sizes --
+ `StorageVersionMigration` handles pagination natively.
+
+4. Removing the `backupKey` fields from the API entirely --
+ they are deprecated but still functional: `spec.backupKey` is
+ used as a read provider fallback when
+ `status.secretEncryption.activeKey` is not yet initialized
+ (see Upgrade Strategy). Once the status is initialized, the
+ status-driven mechanism takes over and `backupKey` is no
+ longer needed. The fields remain in the API for backward
+ compatibility; full removal is deferred to a future release.
+
+5. Expanding the set of encrypted resource types or adding KMS
+ sidecars to the OpenShift API servers
+ (`openshift-apiserver`, `oauth-apiserver`). Currently, KMS
+ sidecars are only configured on KAS, so only KAS-served
+ resources are encrypted via KMS. AESCBC only encrypting
+ `secrets` (not `configmaps`) is also a known gap. Fixing
+ these upstream issues is separate work; the re-encryption
+ mechanism is designed to automatically cover any newly
+ encrypted resources without changes.
+
+6. Tracking convergence of multiple API server Deployments
+ (openshift-apiserver, oauth-apiserver) during the two-stage
+ rollout. The current design tracks KAS convergence only.
+ When encryption is expanded to other API servers, the
+ convergence check should be extended accordingly.
+
+## Proposal
+
+This enhancement adds etcd data re-encryption support to HyperShift
+by reusing library-go's `KubeStorageVersionMigrator` struct (which
+creates and monitors `StorageVersionMigration` CRs via the
+`migration.k8s.io/v1alpha1` API), deploying the
+`kube-storage-version-migrator` in the control plane, introducing
+a status field for rotation tracking, and deprecating the
+`backupKey` spec fields.
+
+The design reuses the same `StorageVersionMigration` mechanism that
+standalone OCP uses for re-encryption, ensuring consistency and
+debuggability across both topologies.
+
+The changes span multiple components:
+
+1. **HCCO** (new controller): Derives the current rotation phase
+ from observable state on each reconciliation — not from a
+ stored phase field. Detects key changes by comparing
+ `hcp.Spec.SecretEncryption` against
+ `hcp.Status.SecretEncryption.ActiveKey`. When a rotation is
+ needed, sets `targetKey`, creates a history entry, waits for
+ KAS convergence between stages, creates
+ `StorageVersionMigration` CRs when conditions for Migrating
+ are met, and on completion updates `ActiveKey`, clears
+ `TargetKey`, and sets `EtcdDataEncryptionUpToDate=True`.
+ Records the derived phase in `history[0].state` for
+ observability, but does not use it as input for decisions.
+
+2. **CPO `adaptSecretEncryptionConfig()`** (existing code
+ modification): Derives the correct `EncryptionConfiguration`
+ from observable state — comparing `spec.activeKey` against
+ `status.activeKey` and inspecting the current
+ EncryptionConfiguration contents to determine whether the
+ target key is already present as a read or write provider.
+ Implements the two-stage rollout: if the target key is not
+ yet in the config, adds it as read-only; if it's present as
+ read-only and KAS has converged, promotes it to write.
+ `spec.backupKey` is used as a fallback only when
+ `status.secretEncryption.activeKey` is not set (upgrade
+ transition).
+
+3. **CPO** (existing code modification): Deploy the
+ `kube-storage-version-migrator` as a control plane component
+ in the HCP namespace, connecting to the guest cluster KAS.
+
+4. **CPO CVO component** (platform-conditional): On non-IBM
+ platforms, add the data-plane
+ `cluster-kube-storage-version-migrator-operator` Deployment
+ to `resourcesToRemove()` in
+ `hypershift/control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go`.
+ This generates a cleanup manifest with
+ `release.openshift.io/delete: "true"` that the CVO in the
+ hosted cluster processes to delete the data-plane operator.
+ On IBM Cloud, the data-plane operator is kept (IBM does not
+ want additional control-plane deployments). The control-plane
+ migrator Deployment (Component 3) also uses a platform
+ predicate to skip IBM Cloud.
+
+5. **HyperShift Operator** (existing code modification): Bubble up
+ the `EtcdDataEncryptionUpToDate` condition and
+ `SecretEncryption` status from HCP to HostedCluster. Deploy a
+ `ValidatingAdmissionPolicy` that blocks active key changes on
+ HostedCluster while re-encryption is in progress.
+
+### Workflow Description
+
+1. The cluster administrator updates the active encryption key in
+ the HostedCluster spec (e.g., rotates Azure KMS key version):
+ ```yaml
+ secretEncryption:
+ type: kms
+ kms:
+ provider: Azure
+ azure:
+ activeKey:
+ keyVaultName: my-vault
+ keyName: my-key
+ keyVersion: "v2" # new version
+ ```
+ The administrator only needs to update the `activeKey`. The
+ system tracks the previous key via
+ `status.secretEncryption.activeKey` and automatically
+ manages the old key as a read provider in the
+ `EncryptionConfiguration`. The `backupKey` field is deprecated
+ and no longer required.
+
+2. The HyperShift Operator syncs the key configuration to the HCP
+ namespace (existing behavior, no changes needed).
+
+3. The HCCO re-encryption controller derives the current phase
+ from observable state. It compares `spec.activeKey` against
+ `status.activeKey` — if the status is nil and no `backupKey`
+ is set (upgrade bootstrap), it initializes `status.activeKey`
+ directly. If a rotation is needed (fingerprints differ, or
+ status nil with `backupKey` set), it sets `targetKey` and
+ prepends a history entry. It records the derived phase in
+ `history[0].state` for observability.
+
+4. **Stage 1 — ReadOnlyDeploy**: The CPO's
+ `adaptSecretEncryptionConfig()` independently derives the
+ correct config: it sees `targetKey` is set but not yet in
+ the current EncryptionConfiguration, so it generates the
+ config with the **old key** (`status.activeKey`) as the write
+ provider and the **new key** (`status.targetKey`) as a
+ read-only provider. The KAS Deployment rolls out. During this
+ rolling update, pods with the new config can read both keys
+ but still write with the old key; pods with the old config
+ only see the old key. Since no pod writes with the new key
+ yet, all reads succeed across all replicas.
+
+5. The HCCO observes that KAS has converged with the target key
+ as a read-only provider. It records
+ `history[0].state=WritePromote`.
+
+6. **Stage 2 — WritePromote**: The adapt function sees the
+ target key is present as read-only and KAS has converged, so
+ it generates the config with the **new key** as the write
+ provider and the **old key** as read-only. The KAS Deployment
+ rolls out again. During this rolling update, pods still on
+ the previous config already have the new key as a read-only
+ provider (from Stage 1), so they can decrypt data written by
+ new-config pods.
+
+7. The HCCO observes that KAS has converged with the target key
+ as the write provider. It records
+ `history[0].state=Migrating` and sets
+ `EtcdDataEncryptionUpToDate=False`.
+
+8. **Stage 3 — Migrating**: The HCCO creates
+ `StorageVersionMigration` CRs in the hosted cluster for each
+ encrypted resource type. The `kube-storage-version-migrator`
+ controller, running in the HCP namespace (control plane),
+ processes each CR via the guest cluster KAS: it lists all
+ objects of each resource type and performs a no-op write-back,
+ transparently re-encrypting all data with the new active key.
+
+9. The HCCO detects all `StorageVersionMigration` CRs have
+ `MigrationSucceeded=True`. It sets
+ `hcp.Status.SecretEncryption.ActiveKey` to `TargetKey`,
+ clears `TargetKey`, records `history[0].state=Completed`
+ with `completionTime`, and sets
+ `EtcdDataEncryptionUpToDate=True`.
+
+10. The HyperShift Operator surfaces the condition on the
+ HostedCluster. On the next CPO reconcile, the adapt function
+ sees `activeKey` matches `spec.activeKey` and no `targetKey`
+ is set — the backup sidecar is removed. If the spec key
+ changed during the rotation, the HCCO detects the new
+ mismatch and starts a fresh rotation from step 3.
+
+```mermaid
+sequenceDiagram
+ participant Admin as Cluster Admin
+ participant HO as HyperShift Operator
+ participant CPO as CPO
+ participant KAS as KAS Deployment
+ participant HCCO as HCCO Re-encryption Controller
+ participant SVM as kube-storage-version-migrator
(control plane)
+
+ Admin->>HO: Update activeKey in HC spec
+ HO->>CPO: Sync key config to HCP namespace
+ HCCO->>HCCO: Derive phase from observable state
+ Note over HCCO: spec ≠ status → set targetKey,
record ReadOnlyDeploy in history
+
+ rect rgb(230, 245, 255)
+ Note over CPO,KAS: Stage 1: ReadOnlyDeploy
+ CPO->>CPO: adaptSecretEncryptionConfig() derives:
targetKey not in config → add as read-only
+ CPO->>KAS: Rolling restart with new config
+ KAS-->>HCCO: HCCO observes convergence
+ HCCO->>HCCO: Record WritePromote in history
+ end
+
+ rect rgb(230, 255, 230)
+ Note over CPO,KAS: Stage 2: WritePromote
+ CPO->>CPO: adaptSecretEncryptionConfig() derives:
targetKey is read-only + converged → promote to write
+ CPO->>KAS: Rolling restart with new config
+ KAS-->>HCCO: HCCO observes convergence
+ HCCO->>HCCO: Record Migrating in history
+ Note over HCCO: EtcdDataEncryptionUpToDate=False
+ end
+
+ rect rgb(255, 245, 230)
+ Note over HCCO,SVM: Stage 3: Migrating
+ HCCO->>SVM: Create StorageVersionMigration CRs
+ SVM->>KAS: List + no-op write-back per resource
(via guest cluster KAS)
+ SVM-->>HCCO: MigrationSucceeded=True (all CRs)
+ end
+
+ HCCO->>HCCO: Set activeKey=targetKey, clear targetKey,
record Completed in history
+ Note over HCCO: EtcdDataEncryptionUpToDate=True
+ HCCO-->>HO: Condition + status bubbles up to HC
+ HO-->>Admin: Re-encryption complete
+```
+
+#### Error Handling
+
+**Migration failure**: If a `StorageVersionMigration` CR fails, the
+controller waits 5 minutes (matching library-go's behavior), prunes
+the failed CR, and retries on the next reconcile. The condition is
+set to `False` with reason `ReEncryptionFailed` and a message
+describing which resource failed.
+
+**Key changes mid-rotation**: The system is designed to be safe
+without relying on the VAP (see Component 5). Behavior depends
+on the derived phase:
+
+- **During `ReadOnlyDeploy`**: No data has been encrypted with
+ the target key yet — it is only a read-only provider. If the
+ spec's active key changes, the HCCO updates `status.targetKey`
+ to the new spec key in-place and restarts `ReadOnlyDeploy`.
+ The adapt function picks up the new `targetKey` on the next
+ reconcile and generates the updated `EncryptionConfiguration`.
+ This allows immediate correction of wrong-key mistakes without
+ waiting for a full rotation cycle.
+
+- **During `WritePromote` or `Migrating`**: Some KAS replicas
+ may have already written data with the target key. Abandoning
+ it would require keeping 3 keys simultaneously (old active +
+ abandoned target + new key), which violates the two-sidecar
+ constraint. The HCCO continues with the snapshotted
+ `targetKey` and ignores the spec change. Once the current
+ rotation completes, it detects the new mismatch and starts a
+ fresh rotation.
+
+The VAP (Component 5) provides a better UX by rejecting
+mid-rotation changes at admission time with a clear error
+message, rather than silently restarting or queuing them.
+
+**KAS restart during migration**: `StorageVersionMigration` uses
+`continueToken` for resumption. The controller detects stale CRs
+and retries as needed.
+
+**Guest cluster KAS unreachable**: The HCCO (during Migrating
+phase) and the control plane migrator both connect to the guest
+cluster KAS. If unreachable, the HCCO retries with backoff. The
+condition reflects the inability to check status.
+
+### API Extensions
+
+This enhancement does not introduce new CRDs, webhooks, or
+aggregated API servers. It adds a new status field, a new
+informational status condition to the existing
+HostedControlPlane/HostedCluster resources, a
+`ValidatingAdmissionPolicy` to guard against concurrent key
+rotations, and deprecates the `backupKey` spec fields.
+
+**New condition type:**
+
+```go
+// EtcdDataEncryptionUpToDate indicates whether all etcd data
+// is encrypted with the currently active encryption key.
+// True: all data confirmed encrypted with the active key.
+// False: re-encryption is in progress or has failed.
+// Absent: encryption is not configured.
+EtcdDataEncryptionUpToDate ConditionType = "EtcdDataEncryptionUpToDate"
+```
+
+**Condition reasons:**
+
+```go
+ReadOnlyRolloutInProgressReason = "ReadOnlyRolloutInProgress"
+WritePromotionInProgressReason = "WritePromotionInProgress"
+ReEncryptionInProgressReason = "ReEncryptionInProgress"
+ReEncryptionCompletedReason = "ReEncryptionCompleted"
+ReEncryptionFailedReason = "ReEncryptionFailed"
+ReEncryptionWaitingForKASReason = "ReEncryptionWaitingForKASConvergence"
+ReEncryptionPersistentFailureReason = "ReEncryptionPersistentFailure"
+```
+
+When encryption is not configured (`SecretEncryption` is nil), the
+condition is omitted entirely from status conditions, matching the
+pattern used by `UnmanagedEtcdAvailable`.
+
+**New status field on HostedControlPlane and HostedCluster:**
+
+```go
+// SecretEncryptionStatus tracks the state of secret encryption
+// key rotation and re-encryption.
+// +k8s:deepcopy-gen=true
+type SecretEncryptionStatus struct {
+ // activeKey is the encryption key specification that all etcd
+ // data is confirmed encrypted with. Updated after successful
+ // re-encryption.
+ // +optional
+ ActiveKey SecretEncryptionKeyStatus `json:"activeKey,omitempty,omitzero"`
+ // targetKey is the key being rolled out during an active
+ // rotation. Snapshot from spec.secretEncryption's active key
+ // when the rotation starts. The CPO uses this (not the
+ // current spec) during the rotation, so mid-rotation spec
+ // changes are safely queued until the current rotation
+ // completes. Cleared when rotation completes.
+ // +optional
+ TargetKey SecretEncryptionKeyStatus `json:"targetKey,omitempty,omitzero"`
+ // history contains a list of key rotations applied to this
+ // cluster. The newest entry is first in the list. Entries
+ // have state Completed when re-encryption has finished.
+ // Entries have state ReadOnlyDeploy, WritePromote, or
+ // Migrating when a rotation is in progress. The current
+ // rotation phase is always history[0].state when
+ // history[0] is not Completed or Interrupted.
+ // +optional
+ // +listType=atomic
+ // +kubebuilder:validation:MaxItems=5
+ History []EncryptionMigrationHistory `json:"history,omitempty"`
+}
+
+// EncryptionMigrationState tracks the lifecycle of a key
+// rotation. Progresses through: ReadOnlyDeploy → WritePromote
+// → Migrating → Completed. May also be Interrupted if the
+// target key was replaced during ReadOnlyDeploy.
+// +kubebuilder:validation:Enum=ReadOnlyDeploy;WritePromote;Migrating;Completed;Interrupted
+type EncryptionMigrationState string
+
+const (
+ // EncryptionMigrationStateReadOnlyDeploy means the new key
+ // is being deployed as a read-only provider. The old key
+ // remains the write provider. The CPO's adapt function
+ // generates the EncryptionConfiguration accordingly.
+ EncryptionMigrationStateReadOnlyDeploy EncryptionMigrationState = "ReadOnlyDeploy"
+ // EncryptionMigrationStateWritePromote means the new key is
+ // being promoted to write provider. The old key becomes
+ // read-only.
+ EncryptionMigrationStateWritePromote EncryptionMigrationState = "WritePromote"
+ // EncryptionMigrationStateMigrating means all KAS replicas
+ // have converged on the new write provider and re-encryption
+ // (StorageVersionMigration) is in progress.
+ EncryptionMigrationStateMigrating EncryptionMigrationState = "Migrating"
+ // EncryptionMigrationStateCompleted means all data was
+ // successfully re-encrypted with the target key.
+ EncryptionMigrationStateCompleted EncryptionMigrationState = "Completed"
+ // EncryptionMigrationStateInterrupted means the rotation was
+ // abandoned before data was encrypted with the target key
+ // (e.g., targetKey replaced during ReadOnlyDeploy).
+ EncryptionMigrationStateInterrupted EncryptionMigrationState = "Interrupted"
+)
+
+// EncryptionKeyReference identifies an encryption key by its
+// provider and fingerprint.
+// +k8s:deepcopy-gen=true
+type EncryptionKeyReference struct {
+ // provider identifies the encryption provider.
+ // +required
+ Provider SecretEncryptionProvider `json:"provider"`
+ // fingerprint is the hex-encoded SHA-256 hash of the key's
+ // identity fields.
+ // +required
+ Fingerprint string `json:"fingerprint"`
+}
+
+// EncryptionMigrationHistory records a key rotation, including
+// in-progress rotations. Created when a rotation starts (with
+// state=ReadOnlyDeploy and nil completionTime), updated as the
+// rotation progresses through phases, and finalized when it
+// completes or is interrupted.
+// +k8s:deepcopy-gen=true
+type EncryptionMigrationHistory struct {
+ // from is the key that data was migrated from (the previous
+ // active key).
+ // +required
+ From EncryptionKeyReference `json:"from,omitempty,omitzero"`
+ // to is the key that data was migrated to (the target key).
+ // +required
+ To EncryptionKeyReference `json:"to,omitempty,omitzero"`
+ // state tracks the current phase of this rotation. Progresses
+ // through ReadOnlyDeploy → WritePromote → Migrating →
+ // Completed. May be Interrupted if the target key was
+ // replaced during ReadOnlyDeploy.
+ // +required
+ // +kubebuilder:validation:Enum=ReadOnlyDeploy;WritePromote;Migrating;Completed;Interrupted
+ State EncryptionMigrationState `json:"state"`
+ // startedTime is when the rotation was initiated.
+ // +required
+ StartedTime metav1.Time `json:"startedTime,omitempty,omitzero"`
+ // completionTime is when the rotation finished. Not set
+ // while the rotation is in progress.
+ // +optional
+ CompletionTime metav1.Time `json:"completionTime,omitempty,omitzero"`
+}
+
+// SecretEncryptionProvider identifies the encryption provider
+// recorded in status. This is intentionally not schema-validated
+// because status values are set by controllers, not users. It is
+// a separate type from KMSProvider because the KMSProvider enum
+// does not include AESCBC.
+type SecretEncryptionProvider string
+
+const (
+ SecretEncryptionProviderAzure SecretEncryptionProvider = "Azure"
+ SecretEncryptionProviderAWS SecretEncryptionProvider = "AWS"
+ SecretEncryptionProviderIBMCloud SecretEncryptionProvider = "IBMCloud"
+ SecretEncryptionProviderAESCBC SecretEncryptionProvider = "AESCBC"
+)
+
+// SecretEncryptionKeyStatus records the active key identity.
+// Status-specific types are used instead of reusing the spec
+// types directly, to decouple status serialization from spec
+// type evolution (fields added, renamed, or removed in spec
+// types should not break status compatibility).
+// +k8s:deepcopy-gen=true
+// +kubebuilder:validation:XValidation:rule="self.provider == 'Azure' ? has(self.azure) : !has(self.azure)",message="azure is required when provider is Azure, and forbidden otherwise"
+// +kubebuilder:validation:XValidation:rule="self.provider == 'AWS' ? has(self.aws) : !has(self.aws)",message="aws is required when provider is AWS, and forbidden otherwise"
+// +kubebuilder:validation:XValidation:rule="self.provider == 'IBMCloud' ? has(self.ibmCloud) : !has(self.ibmCloud)",message="ibmCloud is required when provider is IBMCloud, and forbidden otherwise"
+// +kubebuilder:validation:XValidation:rule="self.provider == 'AESCBC' ? has(self.aescbc) : !has(self.aescbc)",message="aescbc is required when provider is AESCBC, and forbidden otherwise"
+// +union
+type SecretEncryptionKeyStatus struct {
+ // provider identifies the encryption provider.
+ // +required
+ // +unionDiscriminator
+ // +kubebuilder:validation:Enum=Azure;AWS;IBMCloud;AESCBC
+ Provider SecretEncryptionProvider `json:"provider"`
+ // azure holds the Azure KMS key identity fields.
+ // +optional
+ // +unionMember
+ Azure AzureKMSKeyStatus `json:"azure,omitempty,omitzero"`
+ // aws holds the AWS KMS key identity fields.
+ // +optional
+ // +unionMember
+ AWS AWSKMSKeyStatus `json:"aws,omitempty,omitzero"`
+ // ibmCloud holds the IBM Cloud KMS key identity fields.
+ // +optional
+ // +unionMember
+ IBMCloud IBMCloudKMSKeyStatus `json:"ibmCloud,omitempty,omitzero"`
+ // aescbc holds a reference to the AESCBC key secret.
+ // +optional
+ // +unionMember
+ AESCBC AESCBCKeyStatus `json:"aescbc,omitempty,omitzero"`
+}
+
+// AzureKMSKeyStatus contains identity fields for an Azure KMS
+// key, sufficient to reconstruct the EncryptionConfiguration
+// read provider.
+// +k8s:deepcopy-gen=true
+type AzureKMSKeyStatus struct {
+ // keyVaultName is the name of the Azure Key Vault.
+ // +required
+ KeyVaultName string `json:"keyVaultName"`
+ // keyName is the name of the key in the vault.
+ // +required
+ KeyName string `json:"keyName"`
+ // keyVersion is the version of the key.
+ // +required
+ KeyVersion string `json:"keyVersion"`
+}
+
+// AWSKMSKeyStatus contains identity fields for an AWS KMS key,
+// sufficient to reconstruct the backup sidecar container
+// arguments.
+// +k8s:deepcopy-gen=true
+type AWSKMSKeyStatus struct {
+ // arn is the Amazon Resource Name of the KMS key.
+ // +required
+ ARN string `json:"arn"`
+ // region is the AWS region of the KMS key.
+ // +required
+ Region string `json:"region"`
+}
+
+// IBMCloudKMSKeyStatus contains identity fields for an IBM
+// Cloud KMS key list entry, sufficient to reconstruct the
+// KP_DATA_JSON entry for the backup key. CorrelationID and
+// URL are included because the IBM Cloud KMS sidecar requires
+// them to initialize the key connection.
+// +k8s:deepcopy-gen=true
+type IBMCloudKMSKeyStatus struct {
+ // crkID is the Customer Root Key ID.
+ // +required
+ CRKID string `json:"crkID"`
+ // instanceID is the KMS instance ID.
+ // +required
+ InstanceID string `json:"instanceID"`
+ // keyVersion is the key version number.
+ // +required
+ KeyVersion int32 `json:"keyVersion"`
+ // region is the IBM Cloud region.
+ // +required
+ Region string `json:"region"`
+ // correlationID is the correlation ID for the key.
+ // +required
+ CorrelationID string `json:"correlationID"`
+ // url is the KMS endpoint URL.
+ // +required
+ URL string `json:"url"`
+}
+
+// AESCBCKeyStatus contains a reference to the AESCBC key
+// secret and a SHA-256 hash of its contents for fingerprinting.
+// +k8s:deepcopy-gen=true
+type AESCBCKeyStatus struct {
+ // secretRef is a reference to the secret containing the
+ // AESCBC key.
+ // +required
+ SecretRef corev1.LocalObjectReference `json:"secretRef"`
+ // dataHash is the hex-encoded SHA-256 hash of the secret's
+ // "key" data field at the time re-encryption completed.
+ // +required
+ DataHash string `json:"dataHash"`
+}
+```
+
+Storing the full key specification in status (not just a hash)
+is critical for resilience: if the `kas-secret-encryption-config`
+secret is accidentally deleted, the CPO can reconstruct the
+`EncryptionConfiguration` using the key in status as the read
+provider. Without this, a deleted secret would leave the cluster
+unable to read data encrypted with the previous key.
+
+The HCCO re-encryption controller detects key changes by
+computing a fingerprint of the spec's active key and the
+status's active key and comparing them: nil/empty status means
+first rotation, mismatch means new rotation, match means
+re-encryption is complete. No hash is stored — it is always
+computed on the fly from the key fields.
+
+The current rotation phase is derived from `history[0].state`
+— there is no separate top-level `rolloutPhase` field. This
+follows the `ControlPlaneVersionStatus` pattern where
+`history[0]` represents the current or most recent operation.
+
+The `history[0].state` drives the CPO's
+`EncryptionConfiguration` generation:
+
+| `history[0].state` | Write provider | Read provider |
+|---|---|---|
+| no history / `Completed` | spec.activeKey | none |
+| `ReadOnlyDeploy` | status.activeKey | status.targetKey |
+| `WritePromote` | status.targetKey | status.activeKey |
+| `Migrating` | status.targetKey | status.activeKey |
+
+This two-stage approach ensures that during each KAS rolling
+update, all replicas can decrypt data written by any other
+replica — no replica ever encounters a key it cannot read (see
+Workflow Description steps 4-8).
+
+The `history` field records both in-progress and completed
+rotations, using fingerprints (not full key specs) to keep the
+status compact. A new entry is prepended when a rotation starts
+(with `state=ReadOnlyDeploy` and no `completionTime`), updated
+as phases progress, and finalized when the rotation completes
+or is interrupted. The list is capped at 5 entries, most recent
+first.
+
+Example status during an active rotation:
+
+```yaml
+status:
+ secretEncryption:
+ activeKey:
+ provider: Azure
+ azure:
+ keyVaultName: my-vault
+ keyName: my-key
+ keyVersion: "v2"
+ targetKey:
+ provider: Azure
+ azure:
+ keyVaultName: my-vault
+ keyName: my-key
+ keyVersion: "v3"
+ history:
+ - from:
+ provider: Azure
+ fingerprint: "a1b2c3..."
+ to:
+ provider: Azure
+ fingerprint: "d4e5f6..."
+ state: WritePromote
+ startedTime: "2026-04-15T10:00:00Z"
+ - from:
+ provider: Azure
+ fingerprint: "789abc..."
+ to:
+ provider: Azure
+ fingerprint: "a1b2c3..."
+ state: Completed
+ startedTime: "2026-01-10T14:00:00Z"
+ completionTime: "2026-01-10T14:12:15Z"
+```
+
+**Deprecated spec fields:**
+
+The `backupKey` fields on `AzureKMSSpec`, `AWSKMSSpec`, and
+`AESCBCSpec` are deprecated. They are still accepted for backward
+compatibility and used as a fallback by the CPO when
+`hcp.Status.SecretEncryption.ActiveKey` is not set (see
+Component 3 priority chain). The re-encryption controller
+ignores `backupKey` — it uses the status field exclusively for
+rotation detection. The system automatically manages the
+previous key as a read provider in the
+`EncryptionConfiguration` based on the status field. IBM Cloud
+KMS (`IBMCloudKMSSpec`) uses a `KeyList` and is unaffected.
+
+**Why `backupKey` cannot always be honored**: For KMS providers,
+each key in the `EncryptionConfiguration` requires its own KMS
+plugin sidecar container. HyperShift deploys exactly two
+sidecars per provider (active + backup). During a rotation, both
+sidecar slots are occupied by `status.activeKey` and
+`status.targetKey` — there is no third sidecar to service a
+`spec.backupKey`. For this reason, `backupKey` is only honored
+when `status.activeKey` is not set (upgrade transition). Once
+the status is initialized, the system manages keys exclusively
+through the status fields.
+
+The following `// Deprecated:` markers must be added to the
+Go doc comments:
+
+```go
+// Deprecated: This field is deprecated and will be ignored
+// when status.secretEncryption.activeKey is set. The system
+// automatically manages the previous key via the status field.
+// +optional
+BackupKey *AWSKMSKeyEntry `json:"backupKey,omitempty"`
+```
+
+The same marker applies to `AzureKMSSpec.BackupKey` and
+`AESCBCSpec.BackupKey`.
+
+### Topology Considerations
+
+#### Hypershift / Hosted Control Planes
+
+This enhancement is designed specifically for the HyperShift
+topology. It affects:
+
+- **Management cluster (HCP namespace)**: The HCCO re-encryption
+ controller runs here, deriving the current rotation phase from
+ observable state, managing `targetKey` and `activeKey` in HCP
+ status, and creating `StorageVersionMigration` CRs in the
+ guest cluster. The CPO's `adaptSecretEncryptionConfig()`
+ independently derives the correct EncryptionConfiguration
+ from the same observable state. The
+ `kube-storage-version-migrator` runs as a control plane
+ Deployment, connecting to the guest cluster KAS to process
+ `StorageVersionMigration` CRs.
+- **Guest cluster**: `StorageVersionMigration` CRs are created
+ here by the HCCO using the guest cluster client. The
+ `kube-storage-version-migrator` (running in the control plane)
+ processes these CRs via the guest cluster KAS. The data-plane
+ On non-IBM platforms, the data-plane
+ `cluster-kube-storage-version-migrator-operator` is disabled:
+ its Deployment is added to `resourcesToRemove()` in the CPO's
+ CVO component, which generates a cleanup manifest with
+ `release.openshift.io/delete: "true"`. The CVO in the hosted
+ cluster processes this manifest and deletes the data-plane
+ operator. The control-plane migrator (Component 3) replaces
+ it. On IBM Cloud, the data-plane operator is kept — IBM does
+ not want additional control-plane deployments. The
+ control-plane migrator Deployment uses a platform predicate
+ to skip IBM Cloud.
+
+The design follows HyperShift's established pattern: the CPO
+manages KAS Deployment configuration (encryption config
+generation, migrator deployment), while the HCCO manages the
+re-encryption lifecycle (phase derivation, status management,
+StorageVersionMigration CR lifecycle). Both components derive
+their decisions from observable state independently, with no
+stored phase field as the source of truth. Deploying the
+migrator in the control plane (on non-IBM platforms) ensures
+re-encryption works on clusters with zero worker nodes.
+
+No additional RBAC is required for the HCCO:
+- The HCCO already has `get`, `list`, `watch` on Deployments and
+ Secrets in the HCP namespace.
+- The HCCO authenticates as `system:hosted-cluster-config` in the
+ guest cluster with `cluster-admin` via the `hcco-cluster-admin`
+ ClusterRoleBinding.
+
+The control plane `kube-storage-version-migrator` uses the same
+guest cluster client credentials as other control plane
+components (e.g., via the admin kubeconfig secret).
+
+#### Standalone Clusters
+
+Not directly applicable. Standalone OCP already has re-encryption
+via the library-go encryption framework's `MigrationController`.
+This enhancement brings equivalent functionality to HyperShift.
+
+#### Single-node Deployments or MicroShift
+
+Not applicable. This enhancement does not affect SNO or MicroShift
+deployments. The re-encryption controller only runs in the HCCO,
+which is specific to HyperShift.
+
+#### OpenShift Kubernetes Engine
+
+Not applicable. OKE does not support HyperShift hosted control
+planes.
+
+### Implementation Details/Notes/Constraints
+
+#### Why KubeStorageVersionMigrator Instead of MigrationController
+
+Standalone OCP uses library-go's `MigrationController`, which wraps
+`KubeStorageVersionMigrator` with a 4-step state machine. This
+controller has deep coupling to standalone OCP:
+
+- Expects key secrets in `openshift-config-managed` with specific
+ labels/annotations created by the KeyController. HyperShift gets
+ keys from `HostedCluster.Spec.SecretEncryption`.
+- Expects convergence via `RevisionLabelPodDeployer` checking
+ static pod revisions on master nodes. HyperShift runs KAS as a
+ Deployment.
+- Calls `statemachine.GetEncryptionConfigAndState()` tied to the
+ standalone key lifecycle.
+
+Instead, this enhancement reuses only the
+`KubeStorageVersionMigrator` struct -- a ~130-line, self-contained
+implementation with zero dependencies on the rest of the encryption
+framework. It handles:
+- Creating `StorageVersionMigration` CRs with deterministic names
+- Tracking which encryption key each CR was created for (via
+ annotation)
+- Detecting stale CRs from a previous key and replacing them
+- Monitoring migration status conditions
+- Resolving preferred API versions via discovery
+- Pruning completed/stale migrations
+
+#### Vendoring Requirements
+
+The following packages must be added to HyperShift's vendor tree:
+
+| Package | Why |
+|---|---|
+| `library-go/.../encryption/controllers/migrators` | `KubeStorageVersionMigrator` struct and `Migrator` interface |
+| `kube-storage-version-migrator/.../informer/` | Required by `KubeStorageVersionMigrator` constructor |
+| `kube-storage-version-migrator/.../lister/` | Pulled in transitively by the informer package |
+
+All other dependencies (`migration/v1alpha1` types, typed
+clientset, `factory.Informer`, `k8s.io/client-go/discovery`) are
+already vendored.
+
+The vendored version of `library-go` must match or be
+compatible with HyperShift's existing `library-go` dependency.
+The `KubeStorageVersionMigrator` struct has been stable since
+its introduction and has no version-specific API changes. The
+`kube-storage-version-migrator` informer/lister packages are
+auto-generated and track the `migration.k8s.io/v1alpha1` API,
+which has been stable since OCP 4.3.
+
+#### Component 1: Key Change Detection and Phase Derivation (HCCO)
+
+The HCCO re-encryption controller computes the active key
+fingerprint on each reconciliation:
+
+- Azure KMS: SHA-256 hash of `keyVaultName/keyName/keyVersion`
+- AWS KMS: SHA-256 hash of the active key ARN
+- IBM Cloud KMS: SHA-256 hash of the key list entries' CRK ID,
+ InstanceID, and KeyVersion (the same fields stored in
+ `IBMCloudKMSKeyStatus`). Fields not relevant to key identity
+ (CorrelationID, URL) are excluded from the fingerprint to
+ avoid triggering spurious re-encryption on metadata-only
+ changes.
+- AESCBC: SHA-256 hash of the active key secret name + the
+ SHA-256 hash of the secret's data field
+ (`AESCBCKeySecretKey = "key"`). Key rotation must be performed
+ by creating a new secret and updating the `activeKey.name`
+ reference. **In-place mutation of the key secret (same name,
+ new data) is not supported**: the old key material is
+ overwritten, making data encrypted with the previous key
+ unreadable before re-encryption completes. The AESCBC
+ `SecretRef` in status points to the same secret, so both
+ active and backup would read the overwritten data.
+
+#### Cloud-Provider-Initiated Key Rotation
+
+Cloud KMS providers support automatic key rotation at the
+provider level. The behavior varies by provider and determines
+whether re-encryption is triggered:
+
+**Azure Key Vault**: The HyperShift API requires an explicit
+`keyVersion` in `AzureKMSKey`. When Azure auto-rotates a key
+(creates a new version), KAS continues using the version pinned
+in the spec — nothing changes transparently. The user must
+update `keyVersion` in the spec to use the new version, which
+changes the fingerprint and triggers re-encryption. Azure
+cloud-initiated rotation is therefore always detectable via a
+spec change.
+
+**AWS KMS**: Automatic key rotation creates new backing key
+material behind the same CMK ARN. KAS continues referencing the
+same ARN, and AWS transparently handles decryption using the
+correct key material version (the version is embedded in the
+ciphertext metadata). Since the ARN does not change, the
+fingerprint remains the same and re-encryption is **not**
+automatically triggered. However, this is safe: old data
+remains readable through the same ARN, and new writes
+automatically use the latest key material. If compliance
+requires explicit re-encryption after AWS-initiated rotation,
+the user can trigger it manually (e.g., by toggling a
+spec field). For manual AWS key rotation (new CMK = new ARN),
+the ARN changes and re-encryption is triggered normally.
+
+**IBM Cloud KMS**: The `CRKID` and `KeyVersion` fields are
+explicit in the spec. Cloud-initiated rotation that changes
+these values requires a spec update, which triggers
+re-encryption. IBM Cloud's `KeyList`-based API naturally
+supports tracking multiple key versions.
+
+**AESCBC**: Keys are Kubernetes secrets managed by the user.
+There is no cloud-initiated rotation. The user must create a
+new secret and update the `activeKey.name` reference.
+
+It then compares the computed fingerprint against one derived
+from `hcp.Status.SecretEncryption.ActiveKey`:
+
+- **Status field nil, no `backupKey` set** (upgrade bootstrap —
+ never rotated): Initialize `status.activeKey` to
+ `spec.activeKey` without triggering re-encryption. No
+ rotation needed.
+- **Status field nil, `backupKey` set** (upgrade with rotation
+ in progress): Data may still be encrypted with the backup
+ key. Snapshot the spec's active key into `status.targetKey`,
+ prepend a history entry with `state=ReadOnlyDeploy`, and
+ begin the two-stage rollout. `spec.backupKey` is used as
+ the read provider during the rollout.
+- **Fingerprints match**: Re-encryption already completed for
+ the current key. No action needed.
+- **Fingerprints differ**: New key rotation detected. Snapshot
+ the spec's active key into `status.targetKey`, prepend a
+ new history entry with `state=ReadOnlyDeploy`, and begin
+ the two-stage rollout.
+
+**Mid-rotation spec changes**: The HCCO uses a hybrid
+interrupt/queue strategy depending on the derived phase:
+
+- **`ReadOnlyDeploy`**: If `fingerprint(spec.activeKey)` ≠
+ `fingerprint(status.targetKey)`, the HCCO updates
+ `status.targetKey` to the new spec key and restarts
+ `ReadOnlyDeploy`. This is safe because no data has been
+ encrypted with the target key yet — it was only a read-only
+ provider. The corrected key takes effect on the next KAS
+ rollout without completing a full wasted rotation.
+
+- **`WritePromote` or `Migrating`**: The HCCO continues with
+ the snapshotted `targetKey` and ignores the spec change.
+ Interrupting would require 3 simultaneous keys (old active +
+ abandoned target + new key), violating the two-sidecar
+ constraint. Once the rotation completes, the HCCO detects the
+ new mismatch and starts a fresh rotation.
+
+This hybrid approach ensures safety without relying on the VAP
+while providing fast correction for the common "wrong key"
+scenario.
+
+**Upgrade bootstrap**: When `status.activeKey` is nil and no
+`backupKey` is set, the HCCO initializes `status.activeKey`
+directly without triggering re-encryption. When `backupKey`
+is set, a full re-encryption is triggered to ensure all data
+is encrypted with the current active key before transitioning
+to the status-driven mechanism. See Upgrade Strategy for
+details.
+
+#### Component 2: Re-encryption Orchestration (HCCO)
+
+**New file:**
+`control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
+
+The HCCO re-encryption controller instantiates
+`KubeStorageVersionMigrator` using guest cluster clients and
+derives the current phase from observable state on each
+reconciliation. The phase is **not** read from
+`history[0].state` — it is computed from resources (spec vs
+status keys, EncryptionConfiguration contents, KAS Deployment
+convergence, SVM status). The controller records the derived
+phase in `history[0].state` for observability.
+
+1. If encryption is not configured, remove the
+ `EtcdDataEncryptionUpToDate` condition if present, clear
+ `targetKey` if set, and return.
+
+2. Compute fingerprints of the spec's active key and the
+ status's active key. If they match and no `targetKey` is
+ set, re-encryption is complete — return.
+
+ If `status.activeKey` is nil and no `spec.backupKey` is set
+ (upgrade bootstrap), initialize `status.activeKey` to
+ `spec.activeKey` without triggering re-encryption. Return.
+
+ If a rotation is needed (`status.activeKey` nil with
+ `backupKey` set, or fingerprints differ) and `targetKey` is
+ not set, snapshot the spec's active key into
+ `status.targetKey` and prepend a history entry.
+
+3. **Derive the current phase** from observable state:
+ - Inspect the current `EncryptionConfiguration` to
+ determine whether `targetKey` is present as a provider
+ - Check KAS Deployment convergence
+ (`updatedReplicas == replicas == readyReplicas`)
+ - Check SVM completion status
+
+ The derived phase determines what action to take:
+
+ a. **ReadOnlyDeploy** (targetKey not yet in
+ EncryptionConfiguration, or present but KAS not
+ converged with it as read-only): Wait for the adapt
+ function to generate the config and KAS to converge.
+ Set condition
+ `False/ReEncryptionWaitingForKASConvergence` if not
+ converged. Record `history[0].state=ReadOnlyDeploy`.
+
+ b. **WritePromote** (targetKey present as read-only, KAS
+ converged, but targetKey not yet the write provider, or
+ KAS not converged with it as write): Wait for the adapt
+ function to promote and KAS to converge. Set condition
+ `False/WritePromotionInProgress`. Record
+ `history[0].state=WritePromote`.
+
+ c. **Migrating** (targetKey is write provider, KAS
+ converged): Create `StorageVersionMigration` CRs. Set
+ condition `False/ReEncryptionInProgress`. Record
+ `history[0].state=Migrating`.
+
+4. Determine the encrypted resource list based on encryption
+ type. For KMS, use `KMSEncryptedObjects()` (5 resource
+ types). For AESCBC, use only `secrets` (AESCBC only encrypts
+ secrets, as defined in `aescbc.go`). For each resource, call
+ `migrator.EnsureMigration(gr, computedKeyHash)` and track
+ finished/failed/in-progress state.
+
+5. If all resources migrated successfully, set
+ `hcp.Status.SecretEncryption.ActiveKey` to `TargetKey`,
+ clear `TargetKey`, record `history[0].state=Completed` with
+ `completionTime`, and set condition
+ `True/ReEncryptionCompleted`. The history list is capped at
+ 5 entries (oldest pruned on append). All fields are updated
+ in a single HCP status patch call to ensure atomicity. The
+ controller uses `Status().Patch()` with `MergeFrom` (rather
+ than `Status().Update()`) to avoid clobbering other status
+ fields. On `Conflict` errors, the controller re-reads the
+ HCP and retries the patch.
+
+6. If any resource failed and not retried within 5 minutes,
+ prune the failed CR for retry on next reconcile and set
+ condition `False/ReEncryptionFailed`. The condition message
+ includes elapsed time since re-encryption started (e.g.,
+ `"re-encryption of secrets failed after 12m30s"`) to help
+ operators assess whether manual intervention is needed. The
+ controller tracks the retry count via an annotation on the
+ `StorageVersionMigration` CR. After 3 consecutive failures
+ for the same resource, the condition reason is escalated to
+ `ReEncryptionPersistentFailure` to distinguish transient
+ failures from systemic issues requiring manual intervention.
+
+7. Otherwise, set condition `False/ReEncryptionInProgress` and
+ requeue after 30 seconds. The condition message includes
+ progress (e.g., `"3/5 resources migrated, elapsed 2m15s"`)
+ and the elapsed time since re-encryption started. The elapsed
+ time is computed from the condition's `LastTransitionTime` —
+ when the condition first transitions to `False`, the
+ `LastTransitionTime` is set and remains stable across
+ subsequent updates (since the status stays `False`), providing
+ a natural start timestamp without additional state.
+
+The encrypted resources depend on the encryption type:
+
+For KMS (from `support/config/kms.go:KMSEncryptedObjects()`):
+- `secrets`
+- `configmaps`
+- `routes.route.openshift.io` *
+- `oauthaccesstokens.oauth.openshift.io` *
+- `oauthauthorizetokens.oauth.openshift.io` *
+
+\* These resources are listed in `KMSEncryptedObjects()` but are
+served by the OpenShift API servers, which do not currently have
+KMS sidecars configured. The re-encryption controller creates
+SVMs for all resources returned by `KMSEncryptedObjects()` —
+the SVMs for resources that are not actually encrypted will
+perform no-op write-backs (data passes through unchanged). When
+KMS sidecars are added to the OpenShift API servers, these SVMs
+will automatically perform real re-encryption.
+
+For AESCBC (from `aescbc.go`):
+- `secrets`
+
+Note: AESCBC only encrypting `secrets` (not `configmaps`) is a
+known gap in the existing implementation, not an intentional
+design choice of this enhancement.
+
+**StorageVersionMigration CR naming**: The
+`KubeStorageVersionMigrator` creates SVMs with deterministic
+names prefixed with `encryption-migration-` (e.g.,
+`encryption-migration-core-secrets`). Conflicts with
+admin-created SVMs are unlikely due to this prefix. If a
+conflict occurs, the controller detects the wrong key
+annotation and recreates the CR.
+
+#### EncryptionConfiguration by Phase
+
+The following shows how the `EncryptionConfiguration` changes
+during a KMS (Azure) key rotation from v2 to v3:
+
+**ReadOnlyDeploy** (old key writes, new key read-only):
+```yaml
+apiVersion: apiserver.config.k8s.io/v1
+kind: EncryptionConfiguration
+resources:
+- resources: [secrets, configmaps]
+ providers:
+ - kms:
+ name: azurekmsactive # v2 — write provider
+ endpoint: unix:///azurekmsactive.socket
+ - kms:
+ name: azurekmsbackup # v3 — read-only
+ endpoint: unix:///azurekmsbackup.socket
+ - identity: {}
+```
+
+**WritePromote / Migrating** (new key writes, old key read-only):
+```yaml
+apiVersion: apiserver.config.k8s.io/v1
+kind: EncryptionConfiguration
+resources:
+- resources: [secrets, configmaps]
+ providers:
+ - kms:
+ name: azurekmsactive # v3 — write provider
+ endpoint: unix:///azurekmsactive.socket
+ - kms:
+ name: azurekmsbackup # v2 — read-only
+ endpoint: unix:///azurekmsbackup.socket
+ - identity: {}
+```
+
+**Completed** (single key, backup removed):
+```yaml
+apiVersion: apiserver.config.k8s.io/v1
+kind: EncryptionConfiguration
+resources:
+- resources: [secrets, configmaps]
+ providers:
+ - kms:
+ name: azurekmsactive # v3
+ endpoint: unix:///azurekmsactive.socket
+ - identity: {}
+```
+
+#### Component 3: CPO `adaptSecretEncryptionConfig()` Changes
+
+**Modified files:**
+- `control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`
+- `control-plane-operator/controllers/hostedcontrolplane/v2/kas/kms/aws.go`
+- `control-plane-operator/controllers/hostedcontrolplane/v2/kas/kms/azure.go`
+
+##### KMS Sidecar Architecture
+
+For AWS and Azure KMS, each key in the `EncryptionConfiguration`
+requires its own KMS plugin sidecar container running alongside
+the KAS. HyperShift deploys exactly **two sidecars** per
+provider — one for the active key and one for the backup key —
+with dedicated socket paths and health ports:
+
+- **AWS**: `aws-kms-active` (port 8080, `awskmsactive.sock`)
+ and `aws-kms-backup` (port 8081, `awskmsbackup.sock`)
+- **Azure**: `azure-kms-provider-active` (port 8787,
+ `azurekmsactive.socket`) and `azure-kms-provider-backup`
+ (port 8788, `azurekmsbackup.socket`)
+
+During a rotation, both sidecars are used: one for the write
+provider key and one for the read-only provider key. Which key
+goes to which sidecar depends on the derived state (see
+"Derived-State Configuration" below).
+
+IBM Cloud KMS uses a different architecture: a single sidecar
+(`ibmcloud-kms`) that receives all keys as a JSON key list via
+the `KP_DATA_JSON` environment variable.
+
+For AESCBC, keys are inline in the `EncryptionConfiguration`
+secret (no sidecar needed).
+
+**Encryption config reload**: The KAS must NOT auto-reload the
+`EncryptionConfiguration` during a rollout — all replicas must
+converge on the same config via pod replacement. AWS and IBM
+Cloud KMS providers already set
+`--encryption-provider-config-automatic-reload=false` on the
+KAS. The Azure KMS provider must also set this flag to prevent
+a split-brain scenario where some replicas hot-reload the new
+config before others have been replaced.
+
+The two-sidecar constraint (exactly two KMS sidecars) means at
+most two KMS keys can be active simultaneously. The HCCO's
+queuing behavior (see Component 1) ensures mid-rotation spec
+changes are deferred until the current rotation completes,
+keeping the key count within this limit. The VAP (Component 5)
+provides a better UX by rejecting mid-rotation changes at
+admission time rather than silently queuing them.
+
+##### CPO Modification: Derived-State Configuration
+
+`adaptSecretEncryptionConfig()` derives the correct
+`EncryptionConfiguration` from observable state — it does not
+read `history[0].state`. Instead, it inspects:
+
+1. `spec.activeKey` vs `status.activeKey` — is a rotation
+ needed?
+2. `status.targetKey` — is a rotation in progress?
+3. The current `EncryptionConfiguration` contents — is the
+ target key already present? As read-only or write?
+4. KAS Deployment convergence — have all replicas converged?
+
+The derivation logic:
+
+| Observed state | Write provider | Read provider |
+|---|---|---|
+| No `targetKey` set | spec.activeKey | none |
+| `targetKey` set, not yet in config | status.activeKey | status.targetKey |
+| `targetKey` in config as read-only, KAS converged | status.targetKey | status.activeKey |
+| `targetKey` is write provider | status.targetKey | status.activeKey |
+
+For KMS, the write provider key is configured on the "active"
+sidecar and the read provider key on the "backup" sidecar.
+For AESCBC, both keys are inline in the
+`EncryptionConfiguration` secret.
+
+**backupKey fallback (transition safety)**: If
+`hcp.Status.SecretEncryption.ActiveKey` is not set (e.g., the
+status has not been initialized yet on a cluster that was
+upgraded mid-rotation) and no in-progress history entry, fall back
+to `spec.backupKey` if populated. This ensures backward
+compatibility during the transition period. Once the status is
+initialized, `spec.backupKey` is ignored — during a rotation,
+both sidecar slots are occupied by `status.activeKey` and
+`status.targetKey`, leaving no slot for a third key (see
+"Why `backupKey` cannot always be honored" in API Extensions).
+
+**Backup removal**: When no rotation is in progress and
+`status.activeKey` matches the spec's active key
+(re-encryption confirmed complete), no backup key is needed.
+The backup sidecar container is removed from the KAS pod and
+the read-only provider is removed from the
+`EncryptionConfiguration`.
+
+##### KMS Provider Constructor Changes
+
+Both providers are changed to accept explicit write and read
+key parameters derived from the `history[0].state` table above:
+
+**AWS**:
+
+```go
+// Before
+NewAWSKMSProvider(kmsSpec, image, tokenMinterImage)
+// reads kmsSpec.BackupKey internally
+
+// After
+writeKey, readKey := deriveAWSKeys(hcpStatus, kmsSpec)
+NewAWSKMSProvider(writeKey, readKey, image, tokenMinterImage)
+```
+
+**Azure**:
+
+```go
+// Before
+NewAzureKMSProvider(kmsSpec, image)
+
+// After
+writeKey, readKey := deriveAzureKeys(hcpStatus, kmsSpec)
+NewAzureKMSProvider(writeKey, readKey, image)
+```
+
+**IBM Cloud** — uses a `KeyList` with a single sidecar. No
+backup key change needed; the key list is passed through as-is.
+
+#### Component 4: Control Plane Migrator Deployment (CPO)
+
+The CPO deploys the `kube-storage-version-migrator` as a
+Deployment in the HCP namespace. The migrator image is sourced
+from the OCP release payload. It connects to the guest cluster
+KAS using the `admin-kubeconfig` secret.
+
+On non-IBM platforms, the control-plane migrator replaces the
+data-plane operator (see Topology Considerations). The CPO
+component registration uses a platform predicate to skip
+deployment on IBM Cloud, where the data-plane operator
+continues to run. The `StorageVersionMigration` CRD remains
+installed in all topologies.
+
+#### Component 5: HyperShift Operator Integration
+
+**Modified file:**
+`hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
+
+1. **Bubble up condition**: Surface
+ `EtcdDataEncryptionUpToDate` from HCP to HostedCluster using
+ the copy-if-present pattern (`ValidKubeVirtInfraNetworkMTU`
+ style), not the bulk `hcpConditions` list. Absent on HC when
+ absent on HCP.
+
+2. **Bubble up status**: Copy `SecretEncryption` status from HCP
+ to HostedCluster.
+
+#### Component 6: ValidatingAdmissionPolicy (UX Improvement)
+
+The system is designed to be safe without the VAP — the CPO
+main reconciler queues mid-rotation key changes by continuing
+with the snapshotted `targetKey` and deferring the new spec key
+until the current rotation completes (see Component 1). The
+VAP provides a better user experience by rejecting mid-rotation
+changes at admission time with a clear error message, rather
+than silently queuing them.
+
+The HyperShift Operator deploys a `ValidatingAdmissionPolicy`
+and `ValidatingAdmissionPolicyBinding` on the management cluster
+to block active key changes while a rotation is in progress.
+
+```yaml
+apiVersion: admissionregistration.k8s.io/v1
+kind: ValidatingAdmissionPolicy
+metadata:
+ name: hostedcluster-block-key-rotation-during-reencryption
+spec:
+ failurePolicy: Ignore
+ matchConstraints:
+ resourceRules:
+ - apiGroups: ["hypershift.openshift.io"]
+ apiVersions: ["v1beta1"]
+ operations: ["UPDATE"]
+ resources: ["hostedclusters"]
+ validations:
+ - expression: >-
+ !has(object.status.conditions) ||
+ !object.status.conditions.exists(c,
+ c.type == 'EtcdDataEncryptionUpToDate' &&
+ c.status == 'False') ||
+ (!has(object.spec.secretEncryption) &&
+ !has(oldObject.spec.secretEncryption)) ||
+ (has(object.spec.secretEncryption) &&
+ has(oldObject.spec.secretEncryption) &&
+ object.spec.secretEncryption ==
+ oldObject.spec.secretEncryption)
+ message: >-
+ Cannot change the active encryption key while
+ re-encryption is in progress
+ (EtcdDataEncryptionUpToDate=False). Wait for
+ re-encryption to complete before rotating again.
+---
+apiVersion: admissionregistration.k8s.io/v1
+kind: ValidatingAdmissionPolicyBinding
+metadata:
+ name: hostedcluster-block-key-rotation-during-reencryption
+spec:
+ policyName: hostedcluster-block-key-rotation-during-reencryption
+ validationActions:
+ - Deny
+```
+
+**Design choices:**
+
+- **`failurePolicy: Ignore`**: If the policy cannot be
+ evaluated (API server issue), allow the write rather than
+ blocking all HostedCluster updates. This avoids introducing
+ a new failure mode for cluster lifecycle operations.
+- **Structural equality** (`==` on the whole
+ `secretEncryption` block): CEL compares the entire struct,
+ so no per-provider fingerprint logic is needed. This
+ intentionally blocks all `secretEncryption` changes during
+ re-encryption (not just the active key), because any
+ encryption config change triggers a KAS rollout that could
+ interfere with in-flight migrations.
+- **Nil-safety**: The expression uses `has()` guards to handle
+ the case where `spec.secretEncryption` is nil on either the
+ old or new object. This also covers the case where a user
+ attempts to remove `secretEncryption` entirely while
+ re-encryption is in progress — the `has()` mismatch causes
+ the structural equality check to fail, blocking the removal.
+- **Handles absent condition**: When
+ `EtcdDataEncryptionUpToDate` is not present (encryption not
+ configured, first setup, older HCCO), the policy allows the
+ change.
+- **UPDATE only**: CREATE is always allowed.
+
+**Why VAP over alternatives**: CEL-in-CRD cannot
+cross-reference status from spec validation. Webhooks add
+availability concerns. An HO reconciler guard would accept
+the write then silently not propagate it, causing HC/HCP spec
+divergence. The VAP rejects at admission time with a clear
+error message. However, the VAP is strictly a UX improvement
+— the controller is safe without it (see Component 1 for
+the queuing mechanism).
+
+**Race window**: There is a window (two reconcile cycles)
+between the first key change and `EtcdDataEncryptionUpToDate=False`
+appearing on the HostedCluster. A second key change during
+this window would not be blocked by the VAP. This is safe
+because the controller queues the change — it is not a
+correctness issue, only a UX gap where the user does not get
+immediate feedback.
+
+#### Safety Invariants
+
+Derived from library-go's encryption framework and adapted
+for the two-stage rollout:
+
+1. **Two-stage rollout**: Never promote a new key to write
+ provider until all KAS replicas have it as a read provider.
+ This prevents decryption failures during rolling updates.
+2. **Convergence gating**: No phase transition
+ (`ReadOnlyDeploy` → `WritePromote` → `Migrating`) occurs
+ until all KAS replicas are converged
+ (`updatedReplicas == replicas == readyReplicas`).
+3. **Never remove read-keys before migration completes**.
+4. **Hybrid interrupt/queue for mid-rotation key changes**:
+ During `ReadOnlyDeploy` (no data encrypted with target key),
+ spec changes update `targetKey` in-place and restart the
+ phase. During `WritePromote` or `Migrating`, spec changes
+ are queued until the current rotation completes. This is
+ enforced at the controller level and does not depend on the
+ VAP.
+5. **Retry failed migrations**: Prune and retry after 5 minutes.
+6. **VAP as UX guard**: The VAP rejects mid-rotation active key
+ changes at admission time for better user feedback, but the
+ controller is safe without it.
+7. **Atomic status patches**: Update `ActiveKey`, clear
+ `TargetKey`, set `history[0].state=Completed`, and set the
+ condition in a single `Status().Patch()` call (`MergeFrom`)
+ to avoid partial state.
+
+#### Architecture Diagram
+
+```
+Management Cluster (HCP namespace)
+
+ HyperShift Operator
+ - Surfaces HCP conditions + status to HC
+ (copy-if-present pattern for EtcdDataEncryptionUpToDate)
+ - Deploys ValidatingAdmissionPolicy (UX improvement)
+ that blocks key changes when
+ EtcdDataEncryptionUpToDate=False
+
+ CPO HCCO
+ - adaptSecretEncryptionConfig() - Re-encryption Controller (NEW)
+ (MODIFIED): Derives phase from observable
+ - Derives EncryptionConfig state on each reconcile:
+ from observable state - Compares spec vs status keys
+ (spec/status keys, - Inspects EncryptionConfig
+ current config contents, - Checks KAS convergence
+ KAS convergence) - Checks SVM completion
+ - backupKey fallback for - Sets targetKey, creates SVMs,
+ transition safety updates activeKey
+ - Deploys kube-storage- - Records derived phase in
+ version-migrator in history[0].state
+ HCP namespace - Sets conditions
+
+ kube-storage-version-migrator
+ (control plane Deployment,
+ uses admin-kubeconfig secret)
+ │
+ │ guest cluster KAS
+ ▼
+Hosted Cluster (guest API)
+
+ StorageVersionMigration CRs
+ (migration.k8s.io/v1alpha1)
+ For KMS:
+ - encryption-migration-core-secrets
+ - encryption-migration-core-configmaps
+ - encryption-migration-route.openshift.io-routes
+ - encryption-migration-oauth...-oauthaccesstokens
+ - encryption-migration-oauth...-oauthauthorizetokens
+ For AESCBC:
+ - encryption-migration-core-secrets
+
+ cluster-kube-storage-version-migrator-operator
+ (DISABLED for HyperShift — annotation removed)
+```
+
+### Risks and Mitigations
+
+**Risk**: Re-encryption of large clusters takes a long time,
+resulting in a prolonged `False` condition.
+**Mitigation**: `StorageVersionMigration` handles pagination
+internally. The condition message reports which resources have
+completed and which are still in progress.
+
+**Risk**: KAS restarts during re-encryption interrupt the
+migration.
+**Mitigation**: `StorageVersionMigration` uses `continueToken` for
+resumption. The controller detects stale CRs and retries as
+needed.
+
+**Risk**: `kube-storage-version-migrator` in the control plane is
+degraded, causing migrations to never complete.
+**Mitigation**: The controller sets a `ReEncryptionFailed`
+condition with details after failed migrations persist.
+Operational guidance covers how to check the migrator Deployment
+health in the HCP namespace.
+
+**Risk**: Guest cluster KAS is unreachable from the control plane,
+preventing CR creation, migration processing, or monitoring.
+**Mitigation**: The HCCO and migrator both retry with backoff.
+The condition reflects the inability to check status rather than
+incorrectly reporting success.
+
+**Risk**: A second key rotation occurs before re-encryption
+completes. HyperShift's two-sidecar design retains only one
+previous key — a rapid v1 -> v2 -> v3 rotation would evict
+v1's sidecar, leaving v1-encrypted data unreadable.
+**Mitigation**: The controller uses a hybrid approach
+(VAP-independent): during `ReadOnlyDeploy` (no data encrypted
+with the target key yet), spec changes update `targetKey`
+in-place, allowing immediate correction. During `WritePromote`
+or `Migrating`, spec changes are queued until the current
+rotation completes. The VAP provides a better UX by rejecting
+mid-rotation changes at admission time.
+
+**Risk**: Old-key-encrypted data persists in etcd historical
+revisions after re-encryption until etcd compaction runs.
+**Mitigation**: HyperShift's etcd auto-compaction (typically
+every 5 minutes) clears old revisions. After compaction, old
+revisions containing data encrypted with the previous key are
+removed. The `EtcdDataEncryptionUpToDate=True` condition
+indicates that all *current* revisions are encrypted with the
+active key; operators should be aware that historical revisions
+remain until the next compaction cycle.
+
+**Risk**: KMS API rate limiting during re-encryption of large
+clusters. Each re-encrypted object requires two KMS API calls
+(one decrypt via the old key, one encrypt via the new key). A
+cluster with 10,000 secrets generates approximately 20,000 KMS
+API calls. On Azure Key Vault (2,000 transactions per 10
+seconds per vault), this could trigger throttling.
+**Mitigation**: `StorageVersionMigration` processes one page at
+a time (default page size 500), providing natural throttling.
+However, if multiple hosted clusters sharing the same KMS
+endpoint rotate keys simultaneously, aggregate API call volume
+could hit provider rate limits. Operators managing large fleets
+should stagger key rotations across hosted clusters.
+
+#### Backup and Restore During Re-encryption
+
+etcd backups should not be taken while re-encryption is in
+progress (`EtcdDataEncryptionUpToDate=False`). A backup taken
+during re-encryption contains a mix of objects encrypted with
+the old key and the new key. Restoring such a backup requires
+that both keys are available as read providers in the
+`EncryptionConfiguration`, which may not be the case if the
+backup sidecar has already been removed after a subsequent
+successful re-encryption.
+
+If a backup must be taken during re-encryption, operators
+should ensure that both the old and new encryption keys remain
+accessible (not deactivated or deleted in the cloud KMS) until
+the backup is no longer needed for restore.
+
+### Drawbacks
+
+1. **Increased operational complexity**: The re-encryption
+ controller adds a new reconciliation loop in the HCCO, and
+ the `kube-storage-version-migrator` adds a new Deployment in
+ the HCP namespace. However, the re-encryption controller is
+ dormant when no key rotation is in progress (it no-ops when
+ the computed key fingerprint matches the status field).
+
+2. **Vendoring new packages**: Three packages must be added to
+ HyperShift's vendor tree. These are small, auto-generated
+ packages with no external dependencies beyond what HyperShift
+ already vendors.
+
+3. **Platform-conditional behavior**: The control-plane migrator
+ is only deployed on non-IBM platforms. On IBM Cloud, the
+ data-plane operator continues to run. This requires platform
+ predicates in the CPO component registration and the
+ `resourcesToRemove()` cleanup list.
+
+4. **API deprecation**: The `backupKey` fields are deprecated
+ but must remain in the API for backward compatibility. This
+ creates a period where both mechanisms coexist.
+
+## Alternatives (Not Implemented)
+
+### Reuse library-go's MigrationController Directly
+
+**Rejected because**: Deep coupling to standalone OCP (key
+secrets in `openshift-config-managed`, static pod revision
+checking, `statemachine.GetEncryptionConfigAndState()`). See
+"Why KubeStorageVersionMigrator" above.
+
+### Build Custom Migration Logic Without library-go
+
+Implement `StorageVersionMigration` CR lifecycle management from
+scratch instead of using `KubeStorageVersionMigrator`.
+
+**Rejected because**: `KubeStorageVersionMigrator` is a ~130-line,
+self-contained struct that handles CR creation, stale CR detection,
+annotation tracking, status monitoring, version discovery, and
+pruning. Reimplementing this would duplicate production-tested code
+with no benefit.
+
+### Keep the backupKey API for Rotation Tracking
+
+Instead of introducing a status field and deprecating `backupKey`,
+continue using the existing `activeKey`/`backupKey` spec fields for
+rotation lifecycle management.
+
+**Rejected because**: The `backupKey` API requires the user (or
+automation) to manually manage old key references and remember to
+populate the backup field during rotation. This is error-prone:
+the backup key can be removed prematurely (before re-encryption
+completes), or forgotten entirely (leaving no read provider for
+old data). A status field that the system manages automatically
+eliminates this class of user error and enables the system to
+detect and handle mid-rotation key changes.
+
+### Run the kube-storage-version-migrator in the Data Plane
+
+Instead of deploying the migrator in the control plane, rely on
+the existing data-plane
+`cluster-kube-storage-version-migrator-operator` to process
+`StorageVersionMigration` CRs in the hosted cluster.
+
+**Rejected because**: The data-plane migrator runs as a
+Deployment in the hosted cluster, which requires schedulable
+worker nodes. HyperShift supports clusters with zero worker
+nodes (e.g., during initial provisioning or node pool scale-down),
+and re-encryption must work in these scenarios. Deploying the
+migrator in the control plane (HCP namespace) ensures it is
+always available regardless of worker node count.
+
+### Run the Full Re-encryption State Machine in the CPO
+
+**Rejected because**: The Migrating phase requires creating and
+monitoring `StorageVersionMigration` CRs in the guest cluster.
+The HCCO already has guest cluster client access and manages
+guest cluster resources. Since the phase is derived from
+observable state (not stored), having a single controller (HCCO)
+own the entire lifecycle avoids split-state conflicts. The CPO's
+`adaptSecretEncryptionConfig()` independently derives the
+correct EncryptionConfiguration from the same observable state.
+
+### Direct etcd Manipulation
+
+Instead of using `StorageVersionMigration` CRs, directly read and
+re-write etcd data using the etcd client.
+
+**Rejected because**: This would bypass the kube-apiserver's
+encryption layer and require direct etcd access, which is complex,
+error-prone, and inconsistent with how OpenShift handles
+encryption. The `StorageVersionMigration` approach works through
+the API server, ensuring proper encryption, audit logging, and
+admission control.
+
+## Open Questions [optional]
+
+1. **Should re-encryption block cluster upgrades?** Current
+ recommendation is no -- re-encryption is independent of
+ upgrades. The `EtcdDataEncryptionUpToDate` condition is
+ informational and does not gate upgrade preconditions.
+
+2. **Should the `backupKey` fields be removed in a future API
+ version?** They are currently deprecated and used as a
+ fallback during upgrade transition. Full removal would
+ simplify the API but requires an API version bump.
+
+#### Forward Compatibility
+
+**Cross-type migration** (e.g., AESCBC → AWS KMS): The status
+types support different providers in `from` and `to` fields of
+history entries, and `activeKey` and `targetKey` can have
+different providers. The two-stage rollout and re-encryption
+mechanism work regardless of whether the provider type changes.
+While cross-type migration is not in scope for this
+enhancement, the design does not preclude it.
+
+**Etcd sharding** (see PR #1979): The re-encryption mechanism
+works through the API server (StorageVersionMigration CRs),
+not direct etcd access. If etcd is sharded by resource kind
+via `--etcd-servers-overrides`, the API server routes
+write-backs to the correct shard transparently. The
+re-encryption controller does not need shard awareness.
+
+## Test Plan
+
+
+
+### Unit Tests
+
+- Key fingerprint computation for each provider (Azure KMS, AWS
+ KMS, IBM Cloud KMS, AESCBC).
+ - AESCBC: Verify fingerprint changes when secret name changes
+ (the supported rotation model).
+- Re-encryption controller reconciliation logic:
+ - When no encryption configured: condition not set.
+ - Upgrade bootstrap (status nil, no backupKey): status.activeKey
+ initialized to spec.activeKey, no re-encryption triggered.
+ - Upgrade bootstrap (status nil, backupKey set): re-encryption
+ triggered with backupKey as read provider.
+ - When spec key matches status key: no action taken.
+ - When status field nil (first encryption setup):
+ `targetKey` set, history entry prepended with
+ `state=ReadOnlyDeploy`, two-stage rollout begins.
+ - When spec key differs from status key: `targetKey` set,
+ history entry prepended with `state=ReadOnlyDeploy`,
+ two-stage rollout begins.
+ - Two-stage rollout phase transitions:
+ - `ReadOnlyDeploy` + KAS converged →
+ `history[0].state=WritePromote`.
+ - `WritePromote` + KAS converged →
+ `history[0].state=Migrating`.
+ - `Migrating` + all migrations succeeded →
+ `activeKey=targetKey`, `targetKey` cleared,
+ `history[0].state=Completed` with `completionTime`,
+ condition `True/ReEncryptionCompleted`.
+ - When KAS not converged during any phase: condition
+ `False/ReEncryptionWaitingForKASConvergence`, requeue.
+ - When migrations in progress: condition
+ `False/ReEncryptionInProgress` with progress and elapsed
+ time in message.
+ - When migration failed: retry after 5 minutes by pruning
+ and re-creating. Condition message includes elapsed time.
+ - When migration fails 3 consecutive times: condition reason
+ escalated to `ReEncryptionPersistentFailure`.
+ - Mid-rotation spec change during `ReadOnlyDeploy`:
+ `targetKey` updated in-place to new spec key, phase
+ restarts with corrected target.
+ - Mid-rotation spec change during `WritePromote`/`Migrating`:
+ controller continues with `targetKey`, ignores spec. After
+ completion, detects new mismatch and starts fresh rotation.
+ - AESCBC: Only `secrets` resource migrated (1 CR, not 5).
+ - KMS: All 5 resources from `KMSEncryptedObjects()` migrated.
+- Migration history:
+ - `Completed` entry appended on successful rotation with
+ correct `from`/`to` key references and timestamps.
+ - `Interrupted` entry appended when `targetKey` is replaced
+ during `ReadOnlyDeploy`.
+ - History capped at 5 entries, oldest pruned on append.
+ - First rotation (nil `activeKey`): `from.fingerprint` is
+ empty, `from.provider` matches the initial provider.
+- `StorageVersionMigration` CR naming and annotation logic.
+- CPO `adaptSecretEncryptionConfig` and KMS provider changes:
+ - `history[0].state=ReadOnlyDeploy`: old key=write, new
+ key=read.
+ - `history[0].state=WritePromote`: new key=write, old
+ key=read.
+ - `history[0].state=Migrating`: new key=write, old key=read.
+ - no history / `Completed`: spec.activeKey=write, no backup.
+ - backupKey fallback used when status.activeKey is nil.
+ - Backup sidecar removed when no rotation in progress and
+ status.activeKey matches spec.
+ - AWS/Azure provider constructors accept explicit write/read
+ key parameters instead of reading spec.backupKey.
+- ValidatingAdmissionPolicy:
+ - Active key change rejected when
+ `EtcdDataEncryptionUpToDate=False`.
+ - Active key change allowed when
+ `EtcdDataEncryptionUpToDate=True`.
+ - Active key change allowed when condition is absent.
+ - Non-encryption spec changes allowed regardless of
+ condition state.
+ - Removing `secretEncryption` entirely rejected when
+ `EtcdDataEncryptionUpToDate=False`.
+ - Nil `secretEncryption` on both old and new object: allowed.
+- HO condition bubble-up:
+ - `EtcdDataEncryptionUpToDate` copied when present on HCP.
+ - Condition absent on HC when absent on HCP (not Unknown).
+
+### Integration Tests
+
+- Full key rotation cycle with mock guest cluster.
+- Verify `StorageVersionMigration` CRs are created with correct
+ GVRs.
+- Verify stale CRs are cleaned up on key change.
+
+### E2E Tests
+
+- Azure KMS key rotation with re-encryption (primary test case).
+- AWS KMS key rotation with re-encryption (ROSA HCP coverage).
+- AESCBC key rotation with re-encryption (verify only `secrets`
+ resource is migrated, not all 5).
+- Verify data is re-encrypted by confirming that deactivating
+ the old KMS key after re-encryption completes does not break
+ reads.
+- Verify cluster availability during re-encryption.
+- Verify condition transitions through the two-stage rollout:
+ absent -> `False/ReadOnlyRolloutInProgress` ->
+ `False/WritePromotionInProgress` ->
+ `False/ReEncryptionInProgress` -> `True/ReEncryptionCompleted`.
+- Verify first-encryption-setup: enable encryption on a cluster
+ with existing secrets, verify re-encryption triggers and
+ all data is encrypted.
+- Verify VAP rotation guard: attempt a second key rotation while
+ re-encryption is in progress, confirm the API server rejects
+ the update.
+
+## Graduation Criteria
+
+
+
+### Dev Preview -> Tech Preview
+
+- End-to-end key rotation with re-encryption works for Azure KMS.
+- Unit tests cover all controller logic.
+- Integration tests validate CR lifecycle.
+- E2E test runs in CI for at least one encryption type.
+- Documentation covers key rotation procedure with
+ re-encryption.
+
+### Tech Preview -> GA
+
+- Sufficient time for customer feedback (at least one minor
+ release).
+- E2E tests cover all supported encryption types (Azure KMS,
+ AWS KMS, AESCBC).
+- Scale testing completed (re-encryption on clusters with large
+ numbers of secrets/configmaps).
+- Upgrade scenarios validated.
+- User-facing documentation created in openshift-docs.
+- Support procedures documented.
+
+### Removing a deprecated feature
+
+N/A -- This is a new feature.
+
+## Upgrade / Downgrade Strategy
+
+**Upgrade**: On upgrade, the HCCO re-encryption controller handles
+existing clusters based on their current encryption state:
+
+1. **No encryption configured**: Unaffected. No status fields
+ set, no condition emitted.
+
+2. **Encryption configured, no `backupKey` set** (never
+ rotated): The CPO initializes
+ `status.secretEncryption.activeKey` to `spec.activeKey`
+ without triggering re-encryption or a KAS rollout. This is
+ a status-only update — the cluster is already fully
+ encrypted with the active key. No condition is emitted.
+
+3. **Encryption configured, `backupKey` set** (rotation was in
+ progress or completed before upgrade): Some data may still
+ be encrypted with the backup key. The CPO triggers a full
+ two-stage rollout and re-encryption to ensure all data is
+ encrypted with the current `spec.activeKey`. During the
+ rollout, `spec.backupKey` is used as the read provider
+ (the existing fallback path, since `status.activeKey` is
+ nil). After re-encryption completes,
+ `status.secretEncryption.activeKey` is set and the
+ status-driven mechanism takes over. The `backupKey` field
+ is no longer needed but continues to function as a
+ fallback if status is ever cleared.
+
+The re-encryption controller is added to the HCCO, which is
+upgraded per-hosted-cluster as part of the normal hosted
+cluster upgrade process. No manual steps are required.
+
+**Downgrade**: Downgrading is not supported in HyperShift.
+No downgrade path is provided for this enhancement.
+
+## Version Skew Strategy
+
+The CPO and HCCO are upgraded together per-hosted-cluster. The
+`StorageVersionMigration` API (`migration.k8s.io/v1alpha1`) has
+been stable since OCP 4.3. No cross-component version skew
+exists for the re-encryption flow.
+
+The HO is upgraded independently but uses the copy-if-present
+pattern for condition bubble-up (see Component 5), so version
+skew between the HO and per-cluster HCCO is safe.
+
+## Operational Aspects of API Extensions
+
+### EtcdDataEncryptionUpToDate Condition
+
+- **Impact on existing SLIs**: None. Informational only, does
+ not gate upgrades or availability checks.
+- **Failure modes**:
+ - Condition stuck at `False/ReEncryptionInProgress`: Indicates
+ the `kube-storage-version-migrator` is slow or stalled.
+ Check migrator Deployment health in the HCP namespace.
+ - Condition stuck at `False/ReEncryptionFailed`: Indicates a
+ migration CR has failed. Check the condition message for the
+ specific resource and error. The controller retries
+ automatically after 5 minutes.
+ - Condition at `False/ReEncryptionPersistentFailure`: The
+ same resource has failed 3 or more consecutive times,
+ indicating a systemic issue. Manual investigation is
+ required — check the `StorageVersionMigration` CR status,
+ guest cluster KAS health, and admission webhook
+ configuration.
+ - Condition stuck at `False/ReEncryptionWaitingForKASConvergence`: KAS
+ Deployment rollout has not completed. Check KAS pod health
+ and Deployment status.
+- **Health indicators**:
+ - `EtcdDataEncryptionUpToDate` condition on HostedCluster
+ - HCCO controller logs (`controllers.ReEncryption`)
+ - `StorageVersionMigration` CR status in the guest cluster
+ - `kube-storage-version-migrator` Deployment in the HCP
+ namespace
+
+### StorageVersionMigration CRs
+
+- **Expected scale**: 5 CRs per rotation for KMS, 1 for AESCBC.
+ Pruned after successful migration.
+- **API throughput impact**: Proportional to encrypted object
+ count. Migrator uses pagination.
+
+### Prometheus Metrics
+
+The HCCO re-encryption controller exposes metrics derived from
+`status.secretEncryption` for alerting:
+
+| Metric | Type | Description |
+|---|---|---|
+| `hypershift_encryption_migration_state` | Gauge | Current rotation state per hosted cluster. Label `state` maps to `history[0].state` or `idle` when no rotation is in progress. |
+| `hypershift_encryption_migration_duration_seconds` | Histogram | Duration of completed rotations (from `startedTime` to `completionTime`). |
+| `hypershift_encryption_migration_failures_total` | Counter | Total `StorageVersionMigration` CR failures per hosted cluster. |
+
+**Suggested alert rules:**
+
+- `HyperShiftEncryptionMigrationStuck`: Fires when state is
+ not `idle` or `Completed` for more than 1 hour.
+- `HyperShiftEncryptionMigrationPersistentFailure`: Fires when
+ failures increase by 3+ within 30 minutes.
+
+## Support Procedures
+
+### Detecting Re-encryption Issues
+
+1. **Check the HostedCluster condition**:
+ ```bash
+ oc get hostedcluster \
+ -o jsonpath='{.status.conditions[?(@.type=="EtcdDataEncryptionUpToDate")]}'
+ ```
+
+2. **Check StorageVersionMigration CRs in the guest cluster**:
+ ```bash
+ oc get storageversionmigrations -A
+ ```
+ Look for CRs with `MigrationFailed=True` conditions.
+
+3. **Check HCCO logs**:
+ ```bash
+ oc logs -n \
+ deployment/control-plane-operator \
+ -c hosted-cluster-config-operator \
+ | grep -i reencryption
+ ```
+
+4. **Check kube-storage-version-migrator health in the control
+ plane**:
+ ```bash
+ oc get deployment kube-storage-version-migrator \
+ -n
+ oc get pods -l app=kube-storage-version-migrator \
+ -n
+ ```
+
+### Remediation
+
+- **Stuck migrations**: Delete the stale
+ `StorageVersionMigration` CR in the guest cluster. The HCCO
+ controller will recreate it on the next reconcile.
+
+- **Migrator Deployment unhealthy**: Check the
+ `kube-storage-version-migrator` Deployment in the HCP
+ namespace. Once healthy, the controller will resume
+ monitoring.
+
+- **Condition not updating**: Verify the HCCO pod is running and
+ healthy. Check for errors in HCCO logs related to the
+ re-encryption controller.
+
+## Infrastructure Needed [optional]
+
+No additional infrastructure is needed. The
+`kube-storage-version-migrator` is deployed by the CPO as a
+control plane component. E2E tests use existing CI
+infrastructure for encryption testing.