test/encryption: add InvalidImageRecoveryScenario for KMS plugin imag… by gangwgr · Pull Request #2243 · openshift/library-go

gangwgr · 2026-05-26T07:32:30Z

…e fix

Adds a test scenario that verifies the cluster can recover when a KMS encryption config is set with an invalid plugin image:

Enable KMS with invalid (non-existent) plugin image
Attempt to switch to aescbc (cluster remains stuck on KMS)
Operator reports degraded
Fix the config with valid KMS plugin image
Operator recovers and encryption works normally

Summary by CodeRabbit

New Features
- Operator now applies in-place KMS plugin config updates for the latest key when safe and propagates updated plugin data into the managed encryption state; skips propagation when not applicable or when migration is required.
Tests
- Expanded unit and e2e coverage for in-place updates, migration detection, retry-based verification, recovery from invalid KMS plugin images, and a test ensuring config helpers do not mutate inputs; added degraded/recovery wait scenarios and timeouts.
Chores
- Vault AppRole secret keys standardized to hyphenated names ("role-id", "secret-id").

…e fix Adds a test scenario that verifies the cluster can recover when a KMS encryption config is set with an invalid plugin image: 1. Enable KMS with invalid (non-existent) plugin image 2. Attempt to switch to aescbc (cluster remains stuck on KMS) 3. Operator reports degraded 4. Fix the config with valid KMS plugin image 5. Operator recovers and encryption works normally

openshift-ci · 2026-05-26T07:32:34Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-26T07:32:44Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Controllers add safe in-place KMS plugin-config updates and propagate those updates into the unrevisioned encryption-config during non-convergence; tests (unit and e2e) cover in-place updates, recovery from invalid KMS images, and ensure no input mutation.

Changes

Encryption KMS in-place updates

Layer / File(s)	Summary
Key controller: in-place KMS plugin update & migration check `pkg/operator/encryption/controllers/key_controller.go`	Adds in-place update attempt of newest key secret's `kms-plugin-config` when safe, introduces `kmsMigrationRequired`, and imports semantic equality and encoding.
Key controller unit tests `pkg/operator/encryption/controllers/key_controller_test.go`	Adjusts expected action sequences to include listing `openshift-config-managed`; adds tests for in-place plugin-config updates and `TestKmsMigrationRequired`.
State controller: propagate KMS plugin config `pkg/operator/encryption/controllers/state_controller.go`	On non-convergence, rebuilds/enriches encryption-config from key secrets, validates structure and KMS plugin set/size, and updates the unrevisioned `encryption-config` secret when plugin data changed.
State controller tests `pkg/operator/encryption/controllers/state_controller_test.go`	Adds scenarios verifying propagation, no-op when already matching, skip when secret missing, and skip in non-KMS mode.
encryptiondata immutability test `pkg/operator/encryption/encryptiondata/config_test.go`	Adds `TestToEncryptionStateDoesNotMutateInput` to ensure ToEncryptionState does not mutate its input.
E2E encryption test updates `test/e2e-encryption/encryption_test.go`	Uses semantic deep equality in verifyKMSPlugins with polling and adds an in-place KMS plugin-image update step for key 12 that verifies no new key is created and propagation occurs.
Test helpers & Vault KMS providers `test/library/encryption/helpers.go`, `test/library/encryption/kms/vault.go`	Adds operator-condition polling helpers/timeouts, an invalid Vault KMS plugin image/provider, and changes AppRole secret data keys from `roleID`/`secretID` to `role-id`/`secret-id`.
Test library: invalid-image recovery scenario `test/library/encryption/scenarios.go`	Adds `InvalidImageRecoveryScenario` and `TestEncryptionInvalidImageRecovery`; reorders/imports `metav1` and `k8s.io/apimachinery/pkg/util/rand`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

openshift/library-go#2236: related changes threading context through encryption scenario helpers/tests.
openshift/library-go#2225: overlaps on Vault KMS test helper AppRole secret key handling.

Suggested labels

lgtm, jira/valid-reference

Suggested reviewers

dgrisonnet
p0lyn0mial

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	TestEncryptionInvalidImageRecovery lacks meaningful assertion messages: 6 bare require.NoError calls (lines 325, 328, 334, 337, 350, 353) violate the requirement for descriptive failure messages.	Add descriptive messages to each require.NoError, e.g., require.NoError(t, err, "failed to get APIServer config") for better failure diagnostics.
Ipv6 And Disconnected Network Test Compatibility	⚠️ Warning	PR adds external registry reference (quay.io) for InvalidVaultKMSPluginImage, which will fail in disconnected/IPv6-only environments.	Use local/mirrored image or mark InvalidImageRecoveryScenario with [Skipped:Disconnected] to skip in disconnected clusters lacking external registry access.

✅ Passed checks (9 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main addition: a new test scenario (InvalidImageRecoveryScenario) for KMS plugin image recovery, which aligns with the primary changes in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names are stable and deterministic with no dynamic formatting, timestamps, UUIDs, or pod/node/namespace identifiers in test function or table-driven scenario names.
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests added. Only standard Go test helper functions in test/library/encryption/ without Ginkgo patterns (It(), Describe(), Context(), When()).
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No new Ginkgo e2e tests are added; only test helper functions and library utilities for use by downstream repositories.
Topology-Aware Scheduling Compatibility	✅ Passed	PR only modifies encryption controller logic and test scenarios. No workload deployments or scheduling constraints are introduced; changes are Secret data updates.
Ote Binary Stdout Contract	✅ Passed	No OTE stdout contract violations found. fmt.Printf calls in test/e2e-encryption/encryption_test.go are inside test functions, which are explicitly allowed per check instructions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/library/encryption/scenarios.go`:
- Around line 323-327: Wrap the cluster-scoped APIServerConfig update in a retry
loop: use retry.RetryOnConflict to perform the get-modify-update sequence for
cs.ApiServerConfig (the existing cs.ApiServerConfig.Get(ctx, "cluster", ...) and
cs.ApiServerConfig.Update(ctx, apiServer, ...)) so that transient API conflicts
are retried; in the closure fetch the current APIServerConfig, set
apiServer.Spec.Encryption = scenario.InvalidImageProvider.APIServerEncryption,
attempt the Update and return the error to be retried on conflict.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f65ffc6f-c7e6-43a1-bef2-7b566ab90b98

📥 Commits

Reviewing files that changed from the base of the PR and between c877160 and bee486e.

📒 Files selected for processing (1)

test/library/encryption/scenarios.go

openshift-ci · 2026-05-26T07:40:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr
Once this PR has been reviewed and has the lgtm label, please assign p0lyn0mial for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/encryption/controllers/key_controller.go`:
- Around line 327-340: The current update only rewrites the KMS plugin config
blob and leaves any carried AppRole secret data stale; when handling the
keySecret fetched via secrets.FromKeyState (keySecret) and its Secret object s
(from c.secretClient.Secrets(...).Get) you must also refresh/prune the carried
AppRole secret entries whose keys match the "...-kms-plugin-secret-*" pattern:
retrieve the referenced Vault.Authentication.AppRole.Secret.Name, fetch that
Secret payload, copy its role-id/secret-id (or remove old entries if the
reference changed) into s.Data under the appropriate kms-plugin-secret-* keys
before calling c.secretClient.Secrets(s.Namespace).Update, and ensure
encoding.EncodeKMSPluginConfig is still used for the config key; add a
regression test that performs an in-place swap of
Vault.Authentication.AppRole.Secret.Name and asserts the updated key secret
contains the new role-id/secret-id and no stale kms-plugin-secret-* entries.

In `@pkg/operator/encryption/controllers/state_controller.go`:
- Around line 214-219: The call to encryptiondata.ToEncryptionState currently
discards its error; change the handling so you capture its returned error and
return it if non-nil (e.g., check err from
encryptiondata.ToEncryptionState(existingConfig, keySecrets) and return that
error), rather than proceeding to call encryptiondata.FromEncryptionState when
ToEncryptionState failed or silently no-op when enrichedState is nil; keep the
existing enrichedState nil check but ensure ToEncryptionState errors are
propagated up from this function.

In `@test/library/encryption/kms/vault.go`:
- Around line 109-110: The helper ensureDefaultVaultAppRoleSecret currently
copies creds.Data["role-id"] and creds.Data["secret-id"] directly into
required.Data which can write empty values; update
ensureDefaultVaultAppRoleSecret to validate creds.Data contains non-empty
"role-id" and "secret-id" before assigning to required.Data, using require or
test assertions that fail fast with a clear message referencing the
vault-kms/vault-credentials secret and the missing/empty field names so the test
setup errors explicitly if those keys are absent or empty.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8b83ad6-657c-4d82-ba42-cd9e3042a7ea

📥 Commits

Reviewing files that changed from the base of the PR and between bee486e and a11498d.

📒 Files selected for processing (7)

pkg/operator/encryption/controllers/key_controller.go
pkg/operator/encryption/controllers/key_controller_test.go
pkg/operator/encryption/controllers/state_controller.go
pkg/operator/encryption/controllers/state_controller_test.go
pkg/operator/encryption/encryptiondata/config_test.go
test/e2e-encryption/encryption_test.go
test/library/encryption/kms/vault.go

coderabbitai · 2026-05-26T07:52:03Z

+	keySecret, err := secrets.FromKeyState(c.instanceName, latest)
+	if err != nil {
+		return err
+	}
+	s, err := c.secretClient.Secrets(keySecret.Namespace).Get(ctx, keySecret.Name, metav1.GetOptions{})
+	if err != nil {
+		return fmt.Errorf("failed to get key secret %s/%s: %v", keySecret.Namespace, keySecret.Name, err)
+	}
+	pluginData, err := encoding.EncodeKMSPluginConfig(apiServerEncryption.KMS)
+	if err != nil {
+		return fmt.Errorf("failed to encode KMS plugin config: %v", err)
+	}
+	s.Data["encryption.apiserver.operator.openshift.io-kms-plugin-config"] = pluginData
+	_, updateErr := c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh the carried AppRole secret data during in-place auth updates.

generateKeySecret() copies the referenced AppRole payload into the key secret, but this path only rewrites ...-kms-plugin-config. If the in-place change is a new Vault.Authentication.AppRole.Secret.Name, the latest key keeps the old role-id/secret-id entries, so the rollout still comes up with stale credentials. Please update/prune the ...-kms-plugin-secret-* data alongside the plugin config here, and add a regression case that swaps the AppRole secret reference.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/encryption/controllers/key_controller.go` around lines 327 - 340, The current update only rewrites the KMS plugin config blob and leaves any carried AppRole secret data stale; when handling the keySecret fetched via secrets.FromKeyState (keySecret) and its Secret object s (from c.secretClient.Secrets(...).Get) you must also refresh/prune the carried AppRole secret entries whose keys match the "...-kms-plugin-secret-*" pattern: retrieve the referenced Vault.Authentication.AppRole.Secret.Name, fetch that Secret payload, copy its role-id/secret-id (or remove old entries if the reference changed) into s.Data under the appropriate kms-plugin-secret-* keys before calling c.secretClient.Secrets(s.Namespace).Update, and ensure encoding.EncodeKMSPluginConfig is still used for the config key; add a regression test that performs an in-place swap of Vault.Authentication.AppRole.Secret.Name and asserts the updated key secret contains the new role-id/secret-id and no stale kms-plugin-secret-* entries.

coderabbitai · 2026-05-26T07:52:03Z

+	enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	if enrichedState == nil {
+		return nil
+	}
+
+	rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t drop ToEncryptionState() errors here.

If the unrevisioned encryption-config drifts from the backing key secrets, this helper currently ignores that failure and can silently no-op instead of surfacing a degraded condition. Returning the error here avoids rebuilding from a partial state.

Suggested fix

- enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets) + enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets) + if err != nil { + return err + } if enrichedState == nil { return nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)

if enrichedState == nil {

return nil

}

rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)

enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets)

if err != nil {

return err

}

if enrichedState == nil {

return nil

}

rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/encryption/controllers/state_controller.go` around lines 214 - 219, The call to encryptiondata.ToEncryptionState currently discards its error; change the handling so you capture its returned error and return it if non-nil (e.g., check err from encryptiondata.ToEncryptionState(existingConfig, keySecrets) and return that error), rather than proceeding to call encryptiondata.FromEncryptionState when ToEncryptionState failed or silently no-op when enrichedState is nil; keep the existing enrichedState nil check but ensure ToEncryptionState errors are propagated up from this function.

coderabbitai · 2026-05-26T07:52:03Z

+			"role-id":   creds.Data["role-id"],
+			"secret-id": creds.Data["secret-id"],


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n -C2 --type=go --type=yaml --type=sh \ 'vault-credentials|role-id|secret-id|roleID|secretID'

Repository: openshift/library-go

Length of output: 37480

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect the exact code around the changed lines sed -n '70,140p' test/library/encryption/kms/vault.go # Find callers/usage of ensureDefaultVaultAppRoleSecret rg -n "ensureDefaultVaultAppRoleSecret" -S test/library/encryption/kms/vault.go test/library/encryption -S # Check if any other code validates required vault-credentials keys earlier rg -n "defaultVaultCredentialsSecret|vault-credentials" -S test/library/encryption/kms test/library/encryption

Repository: openshift/library-go

Length of output: 4965

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate ApplySecret implementation rg -n --type=go "func ApplySecret\\(" -S . | head -n 50 # Print the implementation area rg -n --type=go "func ApplySecret\\(" -S . | head -n 1 | awk -F: '{print $1}' | xargs -I{} true # (We’ll directly cat the likely file once we know it from the first search.) # Fetch the full file for the first match only (avoid huge output) file="$(rg -n --type=go "func ApplySecret\\(" -S . | head -n 1 | cut -d: -f1)" echo "ApplySecret file: $file" sed -n '1,220p' "$file" # Find where AppRole secret keys are validated (role-id/secret-id) rg -n --type=go "missing required key \"secret-id\"|missing required key \"role-id\"|role-id\" and \"secret-id\"|required key.*role-id|required key.*secret-id" -S pkg | head -n 50 # Locate the specific validation logic for Vault AppRole secret reads rg -n --type=go "vault-approle-secret|VaultAppRoleAuthentication|AppRole secret must contain" -S pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 15601

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show ApplySecretImproved implementation (secret-specific path) rg -n "func ApplySecretImproved" pkg/operator/resource/resourceapply/core.go sed -n '220,420p' pkg/operator/resource/resourceapply/core.go # Inspect operator-side validation of Vault AppRole secret keys sed -n '470,580p' pkg/operator/encryption/controllers/key_controller.go # Also inspect the associated test error message path for missing keys rg -n "missing required key \"secret-id\"|missing required key \"role-id\"" pkg/operator/encryption/controllers/key_controller_test.go sed -n '560,720p' pkg/operator/encryption/controllers/key_controller_test.go

Repository: openshift/library-go

Length of output: 20563

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the exact validation code that emits "missing required key" rg -n --type=go "missing required key \"secret-id\"|missing required key \"role-id\"|missing required key" pkg/operator/encryption/controllers/key_controller.go # Print the surrounding block(s) for the validation logic (small slices) rg -n --type=go -C5 "missing required key" pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 635

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find the PluginSecretData.Set implementation rg -n --type=go "type .*PluginSecretData|PluginSecretData.*Set\$" pkg/operator/encryption | head -n 50 # Find the exact Set receiver method(s) rg -n --type=go "func \\(.*\$ Set\$.*PluginSecretData" pkg/operator/encryption | head -n 50 rg -n --type=go "func \\(.*\$ Set\$.*secretName.*key.*\$" pkg/operator/encryption | head -n 50 # Also locate where KMS.PluginSecretData.Set is called rg -n --type=go "PluginSecretData\\.Set\$" pkg/operator/encryption/controllers/key_controller.go pkg/operator/encryption -S

Repository: openshift/library-go

Length of output: 198

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n --type=go "type .*PluginSecretData" -S . || true rg -n --type=go "PluginSecretData" -S pkg/operator/encryption | head -n 50 rg -n --type=go "func \\(.*\$ Set\$" -S pkg/operator/encryption | rg -n "PluginSecretData" | head -n 50 rg -n --type=go "PluginSecretData\\.Set\\(" -S . || true

Repository: openshift/library-go

Length of output: 1562

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' pkg/operator/encryption/state/types.go sed -n '1,260p' pkg/operator/encryption/secrets/secrets.go # Locate the Set(...) and any helpers SetFromRawKey / SetFromRawKey implementations rg -n --type=go "func \\(.*\$ Set\\(" pkg/operator/encryption/state pkg/operator/encryption/secrets pkg/operator/encryption | head -n 50 rg -n --type=go "SetFromRawKey" pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 14312

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '340,460p' pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 4904

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find call sites of generateKeySecret and surrounding error handling rg -n --type=go "generateKeySecret\\(" pkg/operator/encryption/controllers/key_controller.go # Show the blocks around each call site (small ranges) rg -n --type=go -C5 "generateKeySecret\\(" pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 1450

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n --type=go "secretName, dataKey, and value must not be empty" -S . || true rg -n --type=go "missing required key \"role-id\"|missing required key \"secret-id\"" -S pkg/operator/encryption/controllers rg -n --type=go "Set\\(secretName, dataKey, value" -S pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 568

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,120p' test/library/encryption/kms/vault.go sed -n '120,220p' test/library/encryption/kms/vault.go # Find the test helper that references ensureDefaultVaultAppRoleSecret setup rg -n "ensureDefaultVaultAppRoleSecret" test/library/encryption/kms/vault.go

Repository: openshift/library-go

Length of output: 7310

Fail fast when vault-credentials is missing (or empty) Vault AppRole keys

ensureDefaultVaultAppRoleSecret copies creds.Data["role-id"] / creds.Data["secret-id"] directly (test/library/encryption/kms/vault.go lines 109-110). If the CI-provided vault-credentials secret lacks those fields, the helper can write empty/nil bytes into the target secret; the operator then fails during key creation when KMSSecretData.Set rejects empty values (failed to create key: ... must not be empty), producing a less direct contract-mismatch signal. Add explicit require checks for key presence and non-empty values when building required.Data, so the setup fails with a clear message about the expected role-id/secret-id fields in vault-kms/vault-credentials.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/library/encryption/kms/vault.go` around lines 109 - 110, The helper ensureDefaultVaultAppRoleSecret currently copies creds.Data["role-id"] and creds.Data["secret-id"] directly into required.Data which can write empty values; update ensureDefaultVaultAppRoleSecret to validate creds.Data contains non-empty "role-id" and "secret-id" before assigning to required.Data, using require or test assertions that fail fast with a clear message referencing the vault-kms/vault-credentials secret and the missing/empty field names so the test setup errors explicitly if those keys are absent or empty.

ardaguclu · 2026-05-26T07:57:17Z

+//  3. See degraded
+//  4. Update with KMS with valid image
+//  5. See everything works
+type InvalidImageRecoveryScenario struct {


I think this scenario nicely exercises the in-place field update case which will be added by #2241.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

pkg/operator/encryption/controllers/state_controller.go (1)

214-215: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate ToEncryptionState failures instead of dropping them.

Line 214 discards the returned error. A failed enrichment can silently no-op and hide state drift/corruption during non-convergence.

Suggested fix

-	enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	if err != nil {
+		return err
+	}
 	if enrichedState == nil {
 		return nil
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/state_controller.go` around lines 214 -
215, The call to ToEncryptionState is ignoring its returned error
(enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig,
keySecrets)), which can hide failures; change this to capture the error (e.g.,
enrichedState, err := encryptiondata.ToEncryptionState(...)), check for non-nil
err and propagate it instead of continuing with a nil enrichedState—return the
error from the current reconcile/state handling function or requeue with the
error so failures surface and are not silently dropped. Ensure you reference
ToEncryptionState, enrichedState, existingConfig, and keySecrets when making the
change.

test/library/encryption/kms/vault.go (1)

109-110: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate AppRole credential fields before copying them into the target secret.

Lines 109-110 copy map lookups directly; missing/empty role-id or secret-id values fail later with less clear errors. Fail fast here with explicit assertions.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/encryption/kms/vault.go` around lines 109 - 110, Before copying
creds.Data["role-id"] and creds.Data["secret-id"] into the target secret map,
validate that creds.Data contains non-empty string values for the "role-id" and
"secret-id" keys; if either is missing or empty, return a clear error (or
panic/assert) with context (e.g., "missing AppRole role-id" / "missing AppRole
secret-id"). Update the code that builds the map (the lines assigning "role-id"
and "secret-id") to first check creds and creds.Data types/values and fail fast
with descriptive messages instead of blindly copying potentially nil/empty
values.

pkg/operator/encryption/controllers/key_controller.go (1)

327-340: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh carried kms-plugin-secret-* entries when applying in-place AppRole config changes.

Line 339 only updates ...-kms-plugin-config. If the AppRole secret reference changes, the key secret can keep stale carried role-id/secret-id data from the old secret. Refresh/prune the kms-plugin-secret-* fields in the same update path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/key_controller.go` around lines 327 -
340, The update currently only writes the kms-plugin-config field and can leave
stale carried AppRole secret entries; before calling
c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{}),
remove or refresh any keys in s.Data whose names start with
"encryption.apiserver.operator.openshift.io-kms-plugin-secret-" (e.g., iterate
s.Data map and delete entries with that prefix) so carried role-id/secret-id
from old AppRole refs are pruned when keySecret/AppRole changes; keep existing
logic that sets "encryption.apiserver.operator.openshift.io-kms-plugin-config"
(variables: keySecret, s, pluginData, and the Update call remain) and then call
Update.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/encryption/encryptiondata/config_test.go`:
- Line 475: The test calls encryptiondata.ToEncryptionState(input, keySecrets)
but ignores its returned error, so update the test in config_test.go to capture
both return values (e.g., state, err) and assert the call succeeded;
specifically call encryptiondata.ToEncryptionState(input, keySecrets) -> (state,
err) and then assert err == nil (using the test helper used in the repo, e.g.,
require.NoError(t, err) or t.Fatalf on error) before continuing the immutability
checks on state and input.

In `@test/library/encryption/scenarios.go`:
- Around line 329-340: After calling scenario.WaitForDegraded(ctx, e) add an
explicit assertion that the cluster's encryption mode remained KMS: re-fetch the
APIServer via cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{}), then
assert apiServer.Spec.Encryption.Type equals configv1.EncryptionTypeKMS using
require/expect (e.g., require.Equal) so the test fails if the attempted switch
to aescbc actually applied; reference cs.ApiServerConfig.Get,
apiServer.Spec.Encryption.Type, and scenario.WaitForDegraded when locating where
to add the check.

---

Duplicate comments:
In `@pkg/operator/encryption/controllers/key_controller.go`:
- Around line 327-340: The update currently only writes the kms-plugin-config
field and can leave stale carried AppRole secret entries; before calling
c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{}),
remove or refresh any keys in s.Data whose names start with
"encryption.apiserver.operator.openshift.io-kms-plugin-secret-" (e.g., iterate
s.Data map and delete entries with that prefix) so carried role-id/secret-id
from old AppRole refs are pruned when keySecret/AppRole changes; keep existing
logic that sets "encryption.apiserver.operator.openshift.io-kms-plugin-config"
(variables: keySecret, s, pluginData, and the Update call remain) and then call
Update.

In `@pkg/operator/encryption/controllers/state_controller.go`:
- Around line 214-215: The call to ToEncryptionState is ignoring its returned
error (enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig,
keySecrets)), which can hide failures; change this to capture the error (e.g.,
enrichedState, err := encryptiondata.ToEncryptionState(...)), check for non-nil
err and propagate it instead of continuing with a nil enrichedState—return the
error from the current reconcile/state handling function or requeue with the
error so failures surface and are not silently dropped. Ensure you reference
ToEncryptionState, enrichedState, existingConfig, and keySecrets when making the
change.

In `@test/library/encryption/kms/vault.go`:
- Around line 109-110: Before copying creds.Data["role-id"] and
creds.Data["secret-id"] into the target secret map, validate that creds.Data
contains non-empty string values for the "role-id" and "secret-id" keys; if
either is missing or empty, return a clear error (or panic/assert) with context
(e.g., "missing AppRole role-id" / "missing AppRole secret-id"). Update the code
that builds the map (the lines assigning "role-id" and "secret-id") to first
check creds and creds.Data types/values and fail fast with descriptive messages
instead of blindly copying potentially nil/empty values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bf88526e-d2a2-4471-980d-e267facbb3fb

📥 Commits

Reviewing files that changed from the base of the PR and between a11498d and 78d95d2.

📒 Files selected for processing (8)

pkg/operator/encryption/controllers/key_controller.go
pkg/operator/encryption/controllers/key_controller_test.go
pkg/operator/encryption/controllers/state_controller.go
pkg/operator/encryption/controllers/state_controller_test.go
pkg/operator/encryption/encryptiondata/config_test.go
test/e2e-encryption/encryption_test.go
test/library/encryption/kms/vault.go
test/library/encryption/scenarios.go

coderabbitai · 2026-05-26T08:02:40Z

+		encryptiontesting.CreateEncryptionKeySecretWithRawKey("test", []schema.GroupResource{{Resource: "secrets"}}, 1, []byte("61def964fb967f5d7c44a2af8dab6865")),
+	}
+
+	encryptiondata.ToEncryptionState(input, keySecrets)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert ToEncryptionState succeeds in this immutability test.

Line 475 ignores the error return, so the test can pass even when conversion fails.

Suggested fix

- encryptiondata.ToEncryptionState(input, keySecrets) + _, err := encryptiondata.ToEncryptionState(input, keySecrets) + if err != nil { + t.Fatalf("ToEncryptionState returned error: %v", err) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

encryptiondata.ToEncryptionState(input, keySecrets)

_, err := encryptiondata.ToEncryptionState(input, keySecrets)

if err != nil {

t.Fatalf("ToEncryptionState returned error: %v", err)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/encryption/encryptiondata/config_test.go` at line 475, The test calls encryptiondata.ToEncryptionState(input, keySecrets) but ignores its returned error, so update the test in config_test.go to capture both return values (e.g., state, err) and assert the call succeeded; specifically call encryptiondata.ToEncryptionState(input, keySecrets) -> (state, err) and then assert err == nil (using the test helper used in the repo, e.g., require.NoError(t, err) or t.Fatalf on error) before continuing the immutability checks on state and input.

coderabbitai · 2026-05-26T08:02:40Z

+	// step 2: attempt to switch to aescbc — cluster should remain stuck on KMS
+	t.Log("Attempting to switch to aescbc (cluster should remain stuck on KMS)")
+	apiServer, err = cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{})
+	require.NoError(t, err)
+	apiServer.Spec.Encryption = configv1.APIServerEncryption{Type: configv1.EncryptionTypeAESCBC}
+	_, err = cs.ApiServerConfig.Update(ctx, apiServer, metav1.UpdateOptions{})
+	require.NoError(t, err)
+
+	// step 3: wait for degraded
+	t.Log("Waiting for operator to report degraded status")
+	scenario.WaitForDegraded(ctx, e)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert the mode remains KMS after the failed switch attempt.

After Line 334, the test only waits for degraded status. It does not explicitly verify the cluster stayed on KMS, so step 2 can pass without validating the intended behavior.

Proposed fix

func TestEncryptionInvalidImageRecovery(ctx context.Context, t testing.TB, scenario InvalidImageRecoveryScenario) { e := NewE(t, PrintEventsOnFailure(scenario.OperatorNamespace)) + require.NotNil(t, scenario.AssertFunc, "AssertFunc must not be nil") require.NotNil(t, scenario.WaitForDegraded, "WaitForDegraded must not be nil") require.NotNil(t, scenario.WaitForRecovery, "WaitForRecovery must not be nil") @@ // step 3: wait for degraded t.Log("Waiting for operator to report degraded status") scenario.WaitForDegraded(ctx, e) + scenario.AssertFunc(e, cs, configv1.EncryptionTypeKMS, scenario.Namespace, scenario.LabelSelector)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/library/encryption/scenarios.go` around lines 329 - 340, After calling scenario.WaitForDegraded(ctx, e) add an explicit assertion that the cluster's encryption mode remained KMS: re-fetch the APIServer via cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{}), then assert apiServer.Spec.Encryption.Type equals configv1.EncryptionTypeKMS using require/expect (e.g., require.Equal) so the test fails if the attempted switch to aescbc actually applied; reference cs.ApiServerConfig.Get, apiServer.Spec.Encryption.Type, and scenario.WaitForDegraded when locating where to add the check.

When a KMS sidecar revision is stuck (e.g. wrong container image), the cluster-admin needs to update APIServer.spec.encryption.kms with corrected values. Previously, all encryption controllers blocked on revision convergence, so the fix could never be applied. This change allows the key_controller to update the latest key secret's KMS plugin config in-place for non-migration fields (image, TLS, AppRole) regardless of convergence state. If there is a change in migration-triggering fields, we should wait until convergence happens. State controller carries in-place fields updates only when there is no change in the encryption-config data key.

coderabbitai

♻️ Duplicate comments (1)

pkg/operator/encryption/encryptiondata/config_test.go (1)

475-475: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert ToEncryptionState succeeds before checking immutability.

Line 475 ignores the returned error, so this test can pass even when conversion fails. Capture and assert the error first.

Proposed fix

-	encryptiondata.ToEncryptionState(input, keySecrets)
+	_, err := encryptiondata.ToEncryptionState(input, keySecrets)
+	if err != nil {
+		t.Fatalf("ToEncryptionState returned error: %v", err)
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/encryptiondata/config_test.go` at line 475, The call
to encryptiondata.ToEncryptionState(input, keySecrets) is ignoring its returned
error; change the test to capture both (e.g., state, err :=
encryptiondata.ToEncryptionState(input, keySecrets)), assert the call succeeded
(use the test helper in this file, e.g., require.NoError(t, err) or t.Fatalf on
err), and only then continue with the immutability/assertions against the
returned state variable; reference ToEncryptionState and the local variables you
add (state, err) to locate the change.

🧹 Nitpick comments (1)

pkg/operator/encryption/controllers/state_controller_test.go (1)

1223-1241: ⚡ Quick win

Harden structural assertions to prevent false positives in propagation test.

This check only validates slice lengths, so it won’t catch structural regressions (provider order/type, resource name, KMS endpoint/name). Assert deep equality for actualConfig.Encryption against the expected pre-update structure, and also assert the updated secret is openshift-config-managed/encryption-config-kms.

Suggested test assertion tightening

 			validateFunc: func(ts *testing.T, actions []clientgotesting.Action, destName string, expectedEncryptionCfg *encryptiondata.Config) {
+				foundUpdate := false
 				for _, action := range actions {
 					if action.Matches("update", "secrets") {
 						updateAction := action.(clientgotesting.UpdateAction)
 						actualSecret := updateAction.GetObject().(*corev1.Secret)
+						if actualSecret.Namespace != "openshift-config-managed" || actualSecret.Name != "encryption-config-kms" {
+							continue
+						}
+						foundUpdate = true
 						actualConfig, err := encryptiondata.FromSecret(actualSecret)
 						if err != nil {
 							ts.Fatalf("failed to parse updated encryption config: %v", err)
 						}
 						// Verify plugin config was updated
 						updatedPlugin := encryptiontesting.DefaultKMSPluginConfig
 						updatedPlugin.Vault.KMSPluginImage = "registry.example.com/kms-plugin@sha256:1111111111111111111111111111111111111111111111111111111111111111"
 						if !equality.Semantic.DeepEqual(actualConfig.KMSPlugins["1"], updatedPlugin) {
 							ts.Errorf("plugin config not propagated: got %+v", actualConfig.KMSPlugins["1"])
 						}
-						// Verify EncryptionConfiguration (structural part) is unchanged
-						if len(actualConfig.Encryption.Resources) != 1 || len(actualConfig.Encryption.Resources[0].Providers) != 2 {
-							ts.Errorf("EncryptionConfiguration structure changed unexpectedly")
-						}
+						// Verify EncryptionConfiguration (structural part) is unchanged
+						expectedEncryption := &apiserverconfigv1.EncryptionConfiguration{
+							Resources: []apiserverconfigv1.ResourceConfiguration{{
+								Resources: []string{"secrets"},
+								Providers: []apiserverconfigv1.ProviderConfiguration{{
+									KMS: &apiserverconfigv1.KMSConfiguration{
+										APIVersion: "v2",
+										Name:       "1_secrets",
+										Endpoint:   "unix:///var/run/kmsplugin/kms-1.sock",
+										Timeout:    &metav1.Duration{Duration: 10 * time.Second},
+									},
+								}, {
+									Identity: &apiserverconfigv1.IdentityConfiguration{},
+								}},
+							}},
+						}
+						if !equality.Semantic.DeepEqual(actualConfig.Encryption, expectedEncryption) {
+							ts.Errorf("EncryptionConfiguration changed unexpectedly: %s", diff.ObjectReflectDiff(expectedEncryption, actualConfig.Encryption))
+						}
 						return
 					}
 				}
-				ts.Error("expected an update action for the encryption-config secret")
+				if !foundUpdate {
+					ts.Error("expected an update action for openshift-config-managed/encryption-config-kms")
+				}
 			},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/state_controller_test.go` around lines
1223 - 1241, The test's structural assertion for actualConfig.Encryption is too
weak (only checks slice lengths) and may miss regressions; update the
validateFunc to assert deep equality of actualConfig.Encryption against the
expected pre-update Encryption structure (the exact expected Encryption object
used before the plugin change) rather than just lengths, and also assert the
updated secret's name equals "openshift-config-managed/encryption-config-kms";
reference actualConfig.Encryption for the deep-equal check and keep the existing
plugin propagation assertion that compares actualConfig.KMSPlugins["1"] to
encryptiontesting.DefaultKMSPluginConfig with the modified Vault.KMSPluginImage.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/operator/encryption/encryptiondata/config_test.go`:
- Line 475: The call to encryptiondata.ToEncryptionState(input, keySecrets) is
ignoring its returned error; change the test to capture both (e.g., state, err
:= encryptiondata.ToEncryptionState(input, keySecrets)), assert the call
succeeded (use the test helper in this file, e.g., require.NoError(t, err) or
t.Fatalf on err), and only then continue with the immutability/assertions
against the returned state variable; reference ToEncryptionState and the local
variables you add (state, err) to locate the change.

---

Nitpick comments:
In `@pkg/operator/encryption/controllers/state_controller_test.go`:
- Around line 1223-1241: The test's structural assertion for
actualConfig.Encryption is too weak (only checks slice lengths) and may miss
regressions; update the validateFunc to assert deep equality of
actualConfig.Encryption against the expected pre-update Encryption structure
(the exact expected Encryption object used before the plugin change) rather than
just lengths, and also assert the updated secret's name equals
"openshift-config-managed/encryption-config-kms"; reference
actualConfig.Encryption for the deep-equal check and keep the existing plugin
propagation assertion that compares actualConfig.KMSPlugins["1"] to
encryptiontesting.DefaultKMSPluginConfig with the modified Vault.KMSPluginImage.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 65696f7c-4824-4db8-9a63-8b854698322c

📥 Commits

Reviewing files that changed from the base of the PR and between 78d95d2 and 5374add.

📒 Files selected for processing (9)

pkg/operator/encryption/controllers/key_controller.go
pkg/operator/encryption/controllers/key_controller_test.go
pkg/operator/encryption/controllers/state_controller.go
pkg/operator/encryption/controllers/state_controller_test.go
pkg/operator/encryption/encryptiondata/config_test.go
test/e2e-encryption/encryption_test.go
test/library/encryption/helpers.go
test/library/encryption/kms/vault.go
test/library/encryption/scenarios.go

🚧 Files skipped from review as they are similar to previous changes (5)

test/library/encryption/scenarios.go
pkg/operator/encryption/controllers/state_controller.go
test/e2e-encryption/encryption_test.go
pkg/operator/encryption/controllers/key_controller.go
pkg/operator/encryption/controllers/key_controller_test.go

The invalid image uses a properly formatted OCI reference with sha256 digest that passes API validation but points to a non-existent image, causing ImagePullBackOff and operator degradation.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026

Rename AppRole datakeys to role-id and secret-id

f307d1c

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread test/library/encryption/scenarios.go

gangwgr force-pushed the kms-invalid-image-recovery branch from a11498d to 78d95d2 Compare May 26, 2026 07:46

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

ardaguclu reviewed May 26, 2026

View reviewed changes

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

gangwgr force-pushed the kms-invalid-image-recovery branch from 78d95d2 to 5374add Compare May 27, 2026 04:34

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

gangwgr force-pushed the kms-invalid-image-recovery branch 3 times, most recently from 1f8a774 to 32381d8 Compare May 27, 2026 08:15

test/kms: add InvalidImageVaultEncryptionProvider for degradation tests

d94ed61

The invalid image uses a properly formatted OCI reference with sha256 digest that passes API validation but points to a non-existent image, causing ImagePullBackOff and operator degradation.

gangwgr force-pushed the kms-invalid-image-recovery branch from 32381d8 to d94ed61 Compare May 27, 2026 08:29

		"role-id": creds.Data["role-id"],
		"secret-id": creds.Data["secret-id"],

Conversation

gangwgr commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ardaguclu May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gangwgr commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading