Skip to content

test/encryption: add InvalidImageRecoveryScenario for KMS plugin imag…#2243

Draft
gangwgr wants to merge 4 commits into
openshift:masterfrom
gangwgr:kms-invalid-image-recovery
Draft

test/encryption: add InvalidImageRecoveryScenario for KMS plugin imag…#2243
gangwgr wants to merge 4 commits into
openshift:masterfrom
gangwgr:kms-invalid-image-recovery

Conversation

@gangwgr
Copy link
Copy Markdown
Contributor

@gangwgr gangwgr commented May 26, 2026

…e fix

Adds a test scenario that verifies the cluster can recover when a KMS encryption config is set with an invalid plugin image:

  1. Enable KMS with invalid (non-existent) plugin image
  2. Attempt to switch to aescbc (cluster remains stuck on KMS)
  3. Operator reports degraded
  4. Fix the config with valid KMS plugin image
  5. Operator recovers and encryption works normally

Summary by CodeRabbit

  • New Features

    • Operator now applies in-place KMS plugin config updates for the latest key when safe and propagates updated plugin data into the managed encryption state; skips propagation when not applicable or when migration is required.
  • Tests

    • Expanded unit and e2e coverage for in-place updates, migration detection, retry-based verification, recovery from invalid KMS plugin images, and a test ensuring config helpers do not mutate inputs; added degraded/recovery wait scenarios and timeouts.
  • Chores

    • Vault AppRole secret keys standardized to hyphenated names ("role-id", "secret-id").

…e fix

Adds a test scenario that verifies the cluster can recover when a KMS
encryption config is set with an invalid plugin image:

1. Enable KMS with invalid (non-existent) plugin image
2. Attempt to switch to aescbc (cluster remains stuck on KMS)
3. Operator reports degraded
4. Fix the config with valid KMS plugin image
5. Operator recovers and encryption works normally
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Controllers add safe in-place KMS plugin-config updates and propagate those updates into the unrevisioned encryption-config during non-convergence; tests (unit and e2e) cover in-place updates, recovery from invalid KMS images, and ensure no input mutation.

Changes

Encryption KMS in-place updates

Layer / File(s) Summary
Key controller: in-place KMS plugin update & migration check
pkg/operator/encryption/controllers/key_controller.go
Adds in-place update attempt of newest key secret's kms-plugin-config when safe, introduces kmsMigrationRequired, and imports semantic equality and encoding.
Key controller unit tests
pkg/operator/encryption/controllers/key_controller_test.go
Adjusts expected action sequences to include listing openshift-config-managed; adds tests for in-place plugin-config updates and TestKmsMigrationRequired.
State controller: propagate KMS plugin config
pkg/operator/encryption/controllers/state_controller.go
On non-convergence, rebuilds/enriches encryption-config from key secrets, validates structure and KMS plugin set/size, and updates the unrevisioned encryption-config secret when plugin data changed.
State controller tests
pkg/operator/encryption/controllers/state_controller_test.go
Adds scenarios verifying propagation, no-op when already matching, skip when secret missing, and skip in non-KMS mode.
encryptiondata immutability test
pkg/operator/encryption/encryptiondata/config_test.go
Adds TestToEncryptionStateDoesNotMutateInput to ensure ToEncryptionState does not mutate its input.
E2E encryption test updates
test/e2e-encryption/encryption_test.go
Uses semantic deep equality in verifyKMSPlugins with polling and adds an in-place KMS plugin-image update step for key 12 that verifies no new key is created and propagation occurs.
Test helpers & Vault KMS providers
test/library/encryption/helpers.go, test/library/encryption/kms/vault.go
Adds operator-condition polling helpers/timeouts, an invalid Vault KMS plugin image/provider, and changes AppRole secret data keys from roleID/secretID to role-id/secret-id.
Test library: invalid-image recovery scenario
test/library/encryption/scenarios.go
Adds InvalidImageRecoveryScenario and TestEncryptionInvalidImageRecovery; reorders/imports metav1 and k8s.io/apimachinery/pkg/util/rand.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, jira/valid-reference

Suggested reviewers

  • dgrisonnet
  • p0lyn0mial
🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning TestEncryptionInvalidImageRecovery lacks meaningful assertion messages: 6 bare require.NoError calls (lines 325, 328, 334, 337, 350, 353) violate the requirement for descriptive failure messages. Add descriptive messages to each require.NoError, e.g., require.NoError(t, err, "failed to get APIServer config") for better failure diagnostics.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning PR adds external registry reference (quay.io) for InvalidVaultKMSPluginImage, which will fail in disconnected/IPv6-only environments. Use local/mirrored image or mark InvalidImageRecoveryScenario with [Skipped:Disconnected] to skip in disconnected clusters lacking external registry access.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main addition: a new test scenario (InvalidImageRecoveryScenario) for KMS plugin image recovery, which aligns with the primary changes in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names are stable and deterministic with no dynamic formatting, timestamps, UUIDs, or pod/node/namespace identifiers in test function or table-driven scenario names.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests added. Only standard Go test helper functions in test/library/encryption/ without Ginkgo patterns (It(), Describe(), Context(), When()).
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added; only test helper functions and library utilities for use by downstream repositories.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies encryption controller logic and test scenarios. No workload deployments or scheduling constraints are introduced; changes are Secret data updates.
Ote Binary Stdout Contract ✅ Passed No OTE stdout contract violations found. fmt.Printf calls in test/e2e-encryption/encryption_test.go are inside test functions, which are explicitly allowed per check instructions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/library/encryption/scenarios.go`:
- Around line 323-327: Wrap the cluster-scoped APIServerConfig update in a retry
loop: use retry.RetryOnConflict to perform the get-modify-update sequence for
cs.ApiServerConfig (the existing cs.ApiServerConfig.Get(ctx, "cluster", ...) and
cs.ApiServerConfig.Update(ctx, apiServer, ...)) so that transient API conflicts
are retried; in the closure fetch the current APIServerConfig, set
apiServer.Spec.Encryption = scenario.InvalidImageProvider.APIServerEncryption,
attempt the Update and return the error to be retried on conflict.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f65ffc6f-c7e6-43a1-bef2-7b566ab90b98

📥 Commits

Reviewing files that changed from the base of the PR and between c877160 and bee486e.

📒 Files selected for processing (1)
  • test/library/encryption/scenarios.go

Comment thread test/library/encryption/scenarios.go
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr
Once this PR has been reviewed and has the lgtm label, please assign p0lyn0mial for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gangwgr gangwgr force-pushed the kms-invalid-image-recovery branch from a11498d to 78d95d2 Compare May 26, 2026 07:46
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/encryption/controllers/key_controller.go`:
- Around line 327-340: The current update only rewrites the KMS plugin config
blob and leaves any carried AppRole secret data stale; when handling the
keySecret fetched via secrets.FromKeyState (keySecret) and its Secret object s
(from c.secretClient.Secrets(...).Get) you must also refresh/prune the carried
AppRole secret entries whose keys match the "...-kms-plugin-secret-*" pattern:
retrieve the referenced Vault.Authentication.AppRole.Secret.Name, fetch that
Secret payload, copy its role-id/secret-id (or remove old entries if the
reference changed) into s.Data under the appropriate kms-plugin-secret-* keys
before calling c.secretClient.Secrets(s.Namespace).Update, and ensure
encoding.EncodeKMSPluginConfig is still used for the config key; add a
regression test that performs an in-place swap of
Vault.Authentication.AppRole.Secret.Name and asserts the updated key secret
contains the new role-id/secret-id and no stale kms-plugin-secret-* entries.

In `@pkg/operator/encryption/controllers/state_controller.go`:
- Around line 214-219: The call to encryptiondata.ToEncryptionState currently
discards its error; change the handling so you capture its returned error and
return it if non-nil (e.g., check err from
encryptiondata.ToEncryptionState(existingConfig, keySecrets) and return that
error), rather than proceeding to call encryptiondata.FromEncryptionState when
ToEncryptionState failed or silently no-op when enrichedState is nil; keep the
existing enrichedState nil check but ensure ToEncryptionState errors are
propagated up from this function.

In `@test/library/encryption/kms/vault.go`:
- Around line 109-110: The helper ensureDefaultVaultAppRoleSecret currently
copies creds.Data["role-id"] and creds.Data["secret-id"] directly into
required.Data which can write empty values; update
ensureDefaultVaultAppRoleSecret to validate creds.Data contains non-empty
"role-id" and "secret-id" before assigning to required.Data, using require or
test assertions that fail fast with a clear message referencing the
vault-kms/vault-credentials secret and the missing/empty field names so the test
setup errors explicitly if those keys are absent or empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8b83ad6-657c-4d82-ba42-cd9e3042a7ea

📥 Commits

Reviewing files that changed from the base of the PR and between bee486e and a11498d.

📒 Files selected for processing (7)
  • pkg/operator/encryption/controllers/key_controller.go
  • pkg/operator/encryption/controllers/key_controller_test.go
  • pkg/operator/encryption/controllers/state_controller.go
  • pkg/operator/encryption/controllers/state_controller_test.go
  • pkg/operator/encryption/encryptiondata/config_test.go
  • test/e2e-encryption/encryption_test.go
  • test/library/encryption/kms/vault.go

Comment on lines +327 to +340
keySecret, err := secrets.FromKeyState(c.instanceName, latest)
if err != nil {
return err
}
s, err := c.secretClient.Secrets(keySecret.Namespace).Get(ctx, keySecret.Name, metav1.GetOptions{})
if err != nil {
return fmt.Errorf("failed to get key secret %s/%s: %v", keySecret.Namespace, keySecret.Name, err)
}
pluginData, err := encoding.EncodeKMSPluginConfig(apiServerEncryption.KMS)
if err != nil {
return fmt.Errorf("failed to encode KMS plugin config: %v", err)
}
s.Data["encryption.apiserver.operator.openshift.io-kms-plugin-config"] = pluginData
_, updateErr := c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh the carried AppRole secret data during in-place auth updates.

generateKeySecret() copies the referenced AppRole payload into the key secret, but this path only rewrites ...-kms-plugin-config. If the in-place change is a new Vault.Authentication.AppRole.Secret.Name, the latest key keeps the old role-id/secret-id entries, so the rollout still comes up with stale credentials. Please update/prune the ...-kms-plugin-secret-* data alongside the plugin config here, and add a regression case that swaps the AppRole secret reference.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/key_controller.go` around lines 327 -
340, The current update only rewrites the KMS plugin config blob and leaves any
carried AppRole secret data stale; when handling the keySecret fetched via
secrets.FromKeyState (keySecret) and its Secret object s (from
c.secretClient.Secrets(...).Get) you must also refresh/prune the carried AppRole
secret entries whose keys match the "...-kms-plugin-secret-*" pattern: retrieve
the referenced Vault.Authentication.AppRole.Secret.Name, fetch that Secret
payload, copy its role-id/secret-id (or remove old entries if the reference
changed) into s.Data under the appropriate kms-plugin-secret-* keys before
calling c.secretClient.Secrets(s.Namespace).Update, and ensure
encoding.EncodeKMSPluginConfig is still used for the config key; add a
regression test that performs an in-place swap of
Vault.Authentication.AppRole.Secret.Name and asserts the updated key secret
contains the new role-id/secret-id and no stale kms-plugin-secret-* entries.

Comment on lines +214 to +219
enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
if enrichedState == nil {
return nil
}

rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t drop ToEncryptionState() errors here.

If the unrevisioned encryption-config drifts from the backing key secrets, this helper currently ignores that failure and can silently no-op instead of surfacing a degraded condition. Returning the error here avoids rebuilding from a partial state.

Suggested fix
-	enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	if err != nil {
+		return err
+	}
 	if enrichedState == nil {
 		return nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
if enrichedState == nil {
return nil
}
rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)
enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
if err != nil {
return err
}
if enrichedState == nil {
return nil
}
rebuiltConfig, err := encryptiondata.FromEncryptionState(enrichedState)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/state_controller.go` around lines 214 -
219, The call to encryptiondata.ToEncryptionState currently discards its error;
change the handling so you capture its returned error and return it if non-nil
(e.g., check err from encryptiondata.ToEncryptionState(existingConfig,
keySecrets) and return that error), rather than proceeding to call
encryptiondata.FromEncryptionState when ToEncryptionState failed or silently
no-op when enrichedState is nil; keep the existing enrichedState nil check but
ensure ToEncryptionState errors are propagated up from this function.

Comment on lines +109 to +110
"role-id": creds.Data["role-id"],
"secret-id": creds.Data["secret-id"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n -C2 --type=go --type=yaml --type=sh \
  'vault-credentials|role-id|secret-id|roleID|secretID'

Repository: openshift/library-go

Length of output: 37480


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact code around the changed lines
sed -n '70,140p' test/library/encryption/kms/vault.go

# Find callers/usage of ensureDefaultVaultAppRoleSecret
rg -n "ensureDefaultVaultAppRoleSecret" -S test/library/encryption/kms/vault.go test/library/encryption -S

# Check if any other code validates required vault-credentials keys earlier
rg -n "defaultVaultCredentialsSecret|vault-credentials" -S test/library/encryption/kms test/library/encryption

Repository: openshift/library-go

Length of output: 4965


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate ApplySecret implementation
rg -n --type=go "func ApplySecret\\(" -S . | head -n 50

# Print the implementation area
rg -n --type=go "func ApplySecret\\(" -S . | head -n 1 | awk -F: '{print $1}' | xargs -I{} true
# (We’ll directly cat the likely file once we know it from the first search.)

# Fetch the full file for the first match only (avoid huge output)
file="$(rg -n --type=go "func ApplySecret\\(" -S . | head -n 1 | cut -d: -f1)"
echo "ApplySecret file: $file"
sed -n '1,220p' "$file"

# Find where AppRole secret keys are validated (role-id/secret-id)
rg -n --type=go "missing required key \"secret-id\"|missing required key \"role-id\"|role-id\" and \"secret-id\"|required key.*role-id|required key.*secret-id" -S pkg | head -n 50

# Locate the specific validation logic for Vault AppRole secret reads
rg -n --type=go "vault-approle-secret|VaultAppRoleAuthentication|AppRole secret must contain" -S pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 15601


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show ApplySecretImproved implementation (secret-specific path)
rg -n "func ApplySecretImproved" pkg/operator/resource/resourceapply/core.go
sed -n '220,420p' pkg/operator/resource/resourceapply/core.go

# Inspect operator-side validation of Vault AppRole secret keys
sed -n '470,580p' pkg/operator/encryption/controllers/key_controller.go

# Also inspect the associated test error message path for missing keys
rg -n "missing required key \"secret-id\"|missing required key \"role-id\"" pkg/operator/encryption/controllers/key_controller_test.go
sed -n '560,720p' pkg/operator/encryption/controllers/key_controller_test.go

Repository: openshift/library-go

Length of output: 20563


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the exact validation code that emits "missing required key"
rg -n --type=go "missing required key \"secret-id\"|missing required key \"role-id\"|missing required key" pkg/operator/encryption/controllers/key_controller.go

# Print the surrounding block(s) for the validation logic (small slices)
rg -n --type=go -C5 "missing required key" pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 635


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the PluginSecretData.Set implementation
rg -n --type=go "type .*PluginSecretData|PluginSecretData.*Set\\(" pkg/operator/encryption | head -n 50

# Find the exact Set receiver method(s)
rg -n --type=go "func \\(.*\\) Set\\(.*PluginSecretData" pkg/operator/encryption | head -n 50
rg -n --type=go "func \\(.*\\) Set\\(.*secretName.*key.*\\)" pkg/operator/encryption | head -n 50

# Also locate where KMS.PluginSecretData.Set is called
rg -n --type=go "PluginSecretData\\.Set\\(" pkg/operator/encryption/controllers/key_controller.go pkg/operator/encryption -S

Repository: openshift/library-go

Length of output: 198


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --type=go "type .*PluginSecretData" -S . || true
rg -n --type=go "PluginSecretData" -S pkg/operator/encryption | head -n 50
rg -n --type=go "func \\(.*\\) Set\\(" -S pkg/operator/encryption | rg -n "PluginSecretData" | head -n 50
rg -n --type=go "PluginSecretData\\.Set\\(" -S . || true

Repository: openshift/library-go

Length of output: 1562


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,220p' pkg/operator/encryption/state/types.go
sed -n '1,260p' pkg/operator/encryption/secrets/secrets.go

# Locate the Set(...) and any helpers SetFromRawKey / SetFromRawKey implementations
rg -n --type=go "func \\(.*\\) Set\\(" pkg/operator/encryption/state pkg/operator/encryption/secrets pkg/operator/encryption | head -n 50
rg -n --type=go "SetFromRawKey" pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 14312


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '340,460p' pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 4904


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find call sites of generateKeySecret and surrounding error handling
rg -n --type=go "generateKeySecret\\(" pkg/operator/encryption/controllers/key_controller.go

# Show the blocks around each call site (small ranges)
rg -n --type=go -C5 "generateKeySecret\\(" pkg/operator/encryption/controllers/key_controller.go

Repository: openshift/library-go

Length of output: 1450


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --type=go "secretName, dataKey, and value must not be empty" -S . || true
rg -n --type=go "missing required key \"role-id\"|missing required key \"secret-id\"" -S pkg/operator/encryption/controllers
rg -n --type=go "Set\\(secretName, dataKey, value" -S pkg/operator/encryption | head -n 50

Repository: openshift/library-go

Length of output: 568


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,120p' test/library/encryption/kms/vault.go
sed -n '120,220p' test/library/encryption/kms/vault.go

# Find the test helper that references ensureDefaultVaultAppRoleSecret setup
rg -n "ensureDefaultVaultAppRoleSecret" test/library/encryption/kms/vault.go

Repository: openshift/library-go

Length of output: 7310


Fail fast when vault-credentials is missing (or empty) Vault AppRole keys

ensureDefaultVaultAppRoleSecret copies creds.Data["role-id"] / creds.Data["secret-id"] directly (test/library/encryption/kms/vault.go lines 109-110). If the CI-provided vault-credentials secret lacks those fields, the helper can write empty/nil bytes into the target secret; the operator then fails during key creation when KMSSecretData.Set rejects empty values (failed to create key: ... must not be empty), producing a less direct contract-mismatch signal. Add explicit require checks for key presence and non-empty values when building required.Data, so the setup fails with a clear message about the expected role-id/secret-id fields in vault-kms/vault-credentials.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/encryption/kms/vault.go` around lines 109 - 110, The helper
ensureDefaultVaultAppRoleSecret currently copies creds.Data["role-id"] and
creds.Data["secret-id"] directly into required.Data which can write empty
values; update ensureDefaultVaultAppRoleSecret to validate creds.Data contains
non-empty "role-id" and "secret-id" before assigning to required.Data, using
require or test assertions that fail fast with a clear message referencing the
vault-kms/vault-credentials secret and the missing/empty field names so the test
setup errors explicitly if those keys are absent or empty.

// 3. See degraded
// 4. Update with KMS with valid image
// 5. See everything works
type InvalidImageRecoveryScenario struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this scenario nicely exercises the in-place field update case which will be added by #2241.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
pkg/operator/encryption/controllers/state_controller.go (1)

214-215: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate ToEncryptionState failures instead of dropping them.

Line 214 discards the returned error. A failed enrichment can silently no-op and hide state drift/corruption during non-convergence.

Suggested fix
-	enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	enrichedState, err := encryptiondata.ToEncryptionState(existingConfig, keySecrets)
+	if err != nil {
+		return err
+	}
 	if enrichedState == nil {
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/state_controller.go` around lines 214 -
215, The call to ToEncryptionState is ignoring its returned error
(enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig,
keySecrets)), which can hide failures; change this to capture the error (e.g.,
enrichedState, err := encryptiondata.ToEncryptionState(...)), check for non-nil
err and propagate it instead of continuing with a nil enrichedState—return the
error from the current reconcile/state handling function or requeue with the
error so failures surface and are not silently dropped. Ensure you reference
ToEncryptionState, enrichedState, existingConfig, and keySecrets when making the
change.
test/library/encryption/kms/vault.go (1)

109-110: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate AppRole credential fields before copying them into the target secret.

Lines 109-110 copy map lookups directly; missing/empty role-id or secret-id values fail later with less clear errors. Fail fast here with explicit assertions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/encryption/kms/vault.go` around lines 109 - 110, Before copying
creds.Data["role-id"] and creds.Data["secret-id"] into the target secret map,
validate that creds.Data contains non-empty string values for the "role-id" and
"secret-id" keys; if either is missing or empty, return a clear error (or
panic/assert) with context (e.g., "missing AppRole role-id" / "missing AppRole
secret-id"). Update the code that builds the map (the lines assigning "role-id"
and "secret-id") to first check creds and creds.Data types/values and fail fast
with descriptive messages instead of blindly copying potentially nil/empty
values.
pkg/operator/encryption/controllers/key_controller.go (1)

327-340: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Refresh carried kms-plugin-secret-* entries when applying in-place AppRole config changes.

Line 339 only updates ...-kms-plugin-config. If the AppRole secret reference changes, the key secret can keep stale carried role-id/secret-id data from the old secret. Refresh/prune the kms-plugin-secret-* fields in the same update path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/key_controller.go` around lines 327 -
340, The update currently only writes the kms-plugin-config field and can leave
stale carried AppRole secret entries; before calling
c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{}),
remove or refresh any keys in s.Data whose names start with
"encryption.apiserver.operator.openshift.io-kms-plugin-secret-" (e.g., iterate
s.Data map and delete entries with that prefix) so carried role-id/secret-id
from old AppRole refs are pruned when keySecret/AppRole changes; keep existing
logic that sets "encryption.apiserver.operator.openshift.io-kms-plugin-config"
(variables: keySecret, s, pluginData, and the Update call remain) and then call
Update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/encryption/encryptiondata/config_test.go`:
- Line 475: The test calls encryptiondata.ToEncryptionState(input, keySecrets)
but ignores its returned error, so update the test in config_test.go to capture
both return values (e.g., state, err) and assert the call succeeded;
specifically call encryptiondata.ToEncryptionState(input, keySecrets) -> (state,
err) and then assert err == nil (using the test helper used in the repo, e.g.,
require.NoError(t, err) or t.Fatalf on error) before continuing the immutability
checks on state and input.

In `@test/library/encryption/scenarios.go`:
- Around line 329-340: After calling scenario.WaitForDegraded(ctx, e) add an
explicit assertion that the cluster's encryption mode remained KMS: re-fetch the
APIServer via cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{}), then
assert apiServer.Spec.Encryption.Type equals configv1.EncryptionTypeKMS using
require/expect (e.g., require.Equal) so the test fails if the attempted switch
to aescbc actually applied; reference cs.ApiServerConfig.Get,
apiServer.Spec.Encryption.Type, and scenario.WaitForDegraded when locating where
to add the check.

---

Duplicate comments:
In `@pkg/operator/encryption/controllers/key_controller.go`:
- Around line 327-340: The update currently only writes the kms-plugin-config
field and can leave stale carried AppRole secret entries; before calling
c.secretClient.Secrets(s.Namespace).Update(ctx, s, metav1.UpdateOptions{}),
remove or refresh any keys in s.Data whose names start with
"encryption.apiserver.operator.openshift.io-kms-plugin-secret-" (e.g., iterate
s.Data map and delete entries with that prefix) so carried role-id/secret-id
from old AppRole refs are pruned when keySecret/AppRole changes; keep existing
logic that sets "encryption.apiserver.operator.openshift.io-kms-plugin-config"
(variables: keySecret, s, pluginData, and the Update call remain) and then call
Update.

In `@pkg/operator/encryption/controllers/state_controller.go`:
- Around line 214-215: The call to ToEncryptionState is ignoring its returned
error (enrichedState, _ := encryptiondata.ToEncryptionState(existingConfig,
keySecrets)), which can hide failures; change this to capture the error (e.g.,
enrichedState, err := encryptiondata.ToEncryptionState(...)), check for non-nil
err and propagate it instead of continuing with a nil enrichedState—return the
error from the current reconcile/state handling function or requeue with the
error so failures surface and are not silently dropped. Ensure you reference
ToEncryptionState, enrichedState, existingConfig, and keySecrets when making the
change.

In `@test/library/encryption/kms/vault.go`:
- Around line 109-110: Before copying creds.Data["role-id"] and
creds.Data["secret-id"] into the target secret map, validate that creds.Data
contains non-empty string values for the "role-id" and "secret-id" keys; if
either is missing or empty, return a clear error (or panic/assert) with context
(e.g., "missing AppRole role-id" / "missing AppRole secret-id"). Update the code
that builds the map (the lines assigning "role-id" and "secret-id") to first
check creds and creds.Data types/values and fail fast with descriptive messages
instead of blindly copying potentially nil/empty values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bf88526e-d2a2-4471-980d-e267facbb3fb

📥 Commits

Reviewing files that changed from the base of the PR and between a11498d and 78d95d2.

📒 Files selected for processing (8)
  • pkg/operator/encryption/controllers/key_controller.go
  • pkg/operator/encryption/controllers/key_controller_test.go
  • pkg/operator/encryption/controllers/state_controller.go
  • pkg/operator/encryption/controllers/state_controller_test.go
  • pkg/operator/encryption/encryptiondata/config_test.go
  • test/e2e-encryption/encryption_test.go
  • test/library/encryption/kms/vault.go
  • test/library/encryption/scenarios.go

encryptiontesting.CreateEncryptionKeySecretWithRawKey("test", []schema.GroupResource{{Resource: "secrets"}}, 1, []byte("61def964fb967f5d7c44a2af8dab6865")),
}

encryptiondata.ToEncryptionState(input, keySecrets)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert ToEncryptionState succeeds in this immutability test.

Line 475 ignores the error return, so the test can pass even when conversion fails.

Suggested fix
-	encryptiondata.ToEncryptionState(input, keySecrets)
+	_, err := encryptiondata.ToEncryptionState(input, keySecrets)
+	if err != nil {
+		t.Fatalf("ToEncryptionState returned error: %v", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
encryptiondata.ToEncryptionState(input, keySecrets)
_, err := encryptiondata.ToEncryptionState(input, keySecrets)
if err != nil {
t.Fatalf("ToEncryptionState returned error: %v", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/encryptiondata/config_test.go` at line 475, The test
calls encryptiondata.ToEncryptionState(input, keySecrets) but ignores its
returned error, so update the test in config_test.go to capture both return
values (e.g., state, err) and assert the call succeeded; specifically call
encryptiondata.ToEncryptionState(input, keySecrets) -> (state, err) and then
assert err == nil (using the test helper used in the repo, e.g.,
require.NoError(t, err) or t.Fatalf on error) before continuing the immutability
checks on state and input.

Comment thread test/library/encryption/scenarios.go Outdated
Comment on lines +329 to +340
// step 2: attempt to switch to aescbc — cluster should remain stuck on KMS
t.Log("Attempting to switch to aescbc (cluster should remain stuck on KMS)")
apiServer, err = cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{})
require.NoError(t, err)
apiServer.Spec.Encryption = configv1.APIServerEncryption{Type: configv1.EncryptionTypeAESCBC}
_, err = cs.ApiServerConfig.Update(ctx, apiServer, metav1.UpdateOptions{})
require.NoError(t, err)

// step 3: wait for degraded
t.Log("Waiting for operator to report degraded status")
scenario.WaitForDegraded(ctx, e)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Assert the mode remains KMS after the failed switch attempt.

After Line 334, the test only waits for degraded status. It does not explicitly verify the cluster stayed on KMS, so step 2 can pass without validating the intended behavior.

Proposed fix
 func TestEncryptionInvalidImageRecovery(ctx context.Context, t testing.TB, scenario InvalidImageRecoveryScenario) {
 	e := NewE(t, PrintEventsOnFailure(scenario.OperatorNamespace))
+	require.NotNil(t, scenario.AssertFunc, "AssertFunc must not be nil")
 	require.NotNil(t, scenario.WaitForDegraded, "WaitForDegraded must not be nil")
 	require.NotNil(t, scenario.WaitForRecovery, "WaitForRecovery must not be nil")
@@
 	// step 3: wait for degraded
 	t.Log("Waiting for operator to report degraded status")
 	scenario.WaitForDegraded(ctx, e)
+	scenario.AssertFunc(e, cs, configv1.EncryptionTypeKMS, scenario.Namespace, scenario.LabelSelector)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/library/encryption/scenarios.go` around lines 329 - 340, After calling
scenario.WaitForDegraded(ctx, e) add an explicit assertion that the cluster's
encryption mode remained KMS: re-fetch the APIServer via
cs.ApiServerConfig.Get(ctx, "cluster", metav1.GetOptions{}), then assert
apiServer.Spec.Encryption.Type equals configv1.EncryptionTypeKMS using
require/expect (e.g., require.Equal) so the test fails if the attempted switch
to aescbc actually applied; reference cs.ApiServerConfig.Get,
apiServer.Spec.Encryption.Type, and scenario.WaitForDegraded when locating where
to add the check.

When a KMS sidecar revision is stuck (e.g. wrong container image), the
cluster-admin needs to update APIServer.spec.encryption.kms with corrected
values. Previously, all encryption controllers blocked on revision
convergence, so the fix could never be applied.

This change allows the key_controller to update the latest key secret's
KMS plugin config in-place for non-migration fields (image, TLS, AppRole)
regardless of convergence state. If there is a change in migration-triggering fields,
we should wait until convergence happens.

State controller carries in-place fields updates only when there is no change in the encryption-config
data key.
@gangwgr gangwgr force-pushed the kms-invalid-image-recovery branch from 78d95d2 to 5374add Compare May 27, 2026 04:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/operator/encryption/encryptiondata/config_test.go (1)

475-475: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert ToEncryptionState succeeds before checking immutability.

Line 475 ignores the returned error, so this test can pass even when conversion fails. Capture and assert the error first.

Proposed fix
-	encryptiondata.ToEncryptionState(input, keySecrets)
+	_, err := encryptiondata.ToEncryptionState(input, keySecrets)
+	if err != nil {
+		t.Fatalf("ToEncryptionState returned error: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/encryptiondata/config_test.go` at line 475, The call
to encryptiondata.ToEncryptionState(input, keySecrets) is ignoring its returned
error; change the test to capture both (e.g., state, err :=
encryptiondata.ToEncryptionState(input, keySecrets)), assert the call succeeded
(use the test helper in this file, e.g., require.NoError(t, err) or t.Fatalf on
err), and only then continue with the immutability/assertions against the
returned state variable; reference ToEncryptionState and the local variables you
add (state, err) to locate the change.
🧹 Nitpick comments (1)
pkg/operator/encryption/controllers/state_controller_test.go (1)

1223-1241: ⚡ Quick win

Harden structural assertions to prevent false positives in propagation test.

This check only validates slice lengths, so it won’t catch structural regressions (provider order/type, resource name, KMS endpoint/name). Assert deep equality for actualConfig.Encryption against the expected pre-update structure, and also assert the updated secret is openshift-config-managed/encryption-config-kms.

Suggested test assertion tightening
 			validateFunc: func(ts *testing.T, actions []clientgotesting.Action, destName string, expectedEncryptionCfg *encryptiondata.Config) {
+				foundUpdate := false
 				for _, action := range actions {
 					if action.Matches("update", "secrets") {
 						updateAction := action.(clientgotesting.UpdateAction)
 						actualSecret := updateAction.GetObject().(*corev1.Secret)
+						if actualSecret.Namespace != "openshift-config-managed" || actualSecret.Name != "encryption-config-kms" {
+							continue
+						}
+						foundUpdate = true
 						actualConfig, err := encryptiondata.FromSecret(actualSecret)
 						if err != nil {
 							ts.Fatalf("failed to parse updated encryption config: %v", err)
 						}
 						// Verify plugin config was updated
 						updatedPlugin := encryptiontesting.DefaultKMSPluginConfig
 						updatedPlugin.Vault.KMSPluginImage = "registry.example.com/kms-plugin@sha256:1111111111111111111111111111111111111111111111111111111111111111"
 						if !equality.Semantic.DeepEqual(actualConfig.KMSPlugins["1"], updatedPlugin) {
 							ts.Errorf("plugin config not propagated: got %+v", actualConfig.KMSPlugins["1"])
 						}
-						// Verify EncryptionConfiguration (structural part) is unchanged
-						if len(actualConfig.Encryption.Resources) != 1 || len(actualConfig.Encryption.Resources[0].Providers) != 2 {
-							ts.Errorf("EncryptionConfiguration structure changed unexpectedly")
-						}
+						// Verify EncryptionConfiguration (structural part) is unchanged
+						expectedEncryption := &apiserverconfigv1.EncryptionConfiguration{
+							Resources: []apiserverconfigv1.ResourceConfiguration{{
+								Resources: []string{"secrets"},
+								Providers: []apiserverconfigv1.ProviderConfiguration{{
+									KMS: &apiserverconfigv1.KMSConfiguration{
+										APIVersion: "v2",
+										Name:       "1_secrets",
+										Endpoint:   "unix:///var/run/kmsplugin/kms-1.sock",
+										Timeout:    &metav1.Duration{Duration: 10 * time.Second},
+									},
+								}, {
+									Identity: &apiserverconfigv1.IdentityConfiguration{},
+								}},
+							}},
+						}
+						if !equality.Semantic.DeepEqual(actualConfig.Encryption, expectedEncryption) {
+							ts.Errorf("EncryptionConfiguration changed unexpectedly: %s", diff.ObjectReflectDiff(expectedEncryption, actualConfig.Encryption))
+						}
 						return
 					}
 				}
-				ts.Error("expected an update action for the encryption-config secret")
+				if !foundUpdate {
+					ts.Error("expected an update action for openshift-config-managed/encryption-config-kms")
+				}
 			},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/encryption/controllers/state_controller_test.go` around lines
1223 - 1241, The test's structural assertion for actualConfig.Encryption is too
weak (only checks slice lengths) and may miss regressions; update the
validateFunc to assert deep equality of actualConfig.Encryption against the
expected pre-update Encryption structure (the exact expected Encryption object
used before the plugin change) rather than just lengths, and also assert the
updated secret's name equals "openshift-config-managed/encryption-config-kms";
reference actualConfig.Encryption for the deep-equal check and keep the existing
plugin propagation assertion that compares actualConfig.KMSPlugins["1"] to
encryptiontesting.DefaultKMSPluginConfig with the modified Vault.KMSPluginImage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/operator/encryption/encryptiondata/config_test.go`:
- Line 475: The call to encryptiondata.ToEncryptionState(input, keySecrets) is
ignoring its returned error; change the test to capture both (e.g., state, err
:= encryptiondata.ToEncryptionState(input, keySecrets)), assert the call
succeeded (use the test helper in this file, e.g., require.NoError(t, err) or
t.Fatalf on err), and only then continue with the immutability/assertions
against the returned state variable; reference ToEncryptionState and the local
variables you add (state, err) to locate the change.

---

Nitpick comments:
In `@pkg/operator/encryption/controllers/state_controller_test.go`:
- Around line 1223-1241: The test's structural assertion for
actualConfig.Encryption is too weak (only checks slice lengths) and may miss
regressions; update the validateFunc to assert deep equality of
actualConfig.Encryption against the expected pre-update Encryption structure
(the exact expected Encryption object used before the plugin change) rather than
just lengths, and also assert the updated secret's name equals
"openshift-config-managed/encryption-config-kms"; reference
actualConfig.Encryption for the deep-equal check and keep the existing plugin
propagation assertion that compares actualConfig.KMSPlugins["1"] to
encryptiontesting.DefaultKMSPluginConfig with the modified Vault.KMSPluginImage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 65696f7c-4824-4db8-9a63-8b854698322c

📥 Commits

Reviewing files that changed from the base of the PR and between 78d95d2 and 5374add.

📒 Files selected for processing (9)
  • pkg/operator/encryption/controllers/key_controller.go
  • pkg/operator/encryption/controllers/key_controller_test.go
  • pkg/operator/encryption/controllers/state_controller.go
  • pkg/operator/encryption/controllers/state_controller_test.go
  • pkg/operator/encryption/encryptiondata/config_test.go
  • test/e2e-encryption/encryption_test.go
  • test/library/encryption/helpers.go
  • test/library/encryption/kms/vault.go
  • test/library/encryption/scenarios.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • test/library/encryption/scenarios.go
  • pkg/operator/encryption/controllers/state_controller.go
  • test/e2e-encryption/encryption_test.go
  • pkg/operator/encryption/controllers/key_controller.go
  • pkg/operator/encryption/controllers/key_controller_test.go

@gangwgr gangwgr force-pushed the kms-invalid-image-recovery branch 3 times, most recently from 1f8a774 to 32381d8 Compare May 27, 2026 08:15
The invalid image uses a properly formatted OCI reference with sha256
digest that passes API validation but points to a non-existent image,
causing ImagePullBackOff and operator degradation.
@gangwgr gangwgr force-pushed the kms-invalid-image-recovery branch from 32381d8 to d94ed61 Compare May 27, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants