Skip to content

ARO-24514: Make identityRef optional in AROControlPlane for ASO credential-based authentication and make AROControlPlane identityRef independent#79

Merged
mzazrivec merged 1 commit intostolostron:backplane-2.17from
marek-veber:ARO-24514-identityRef-optional
Feb 27, 2026
Merged

ARO-24514: Make identityRef optional in AROControlPlane for ASO credential-based authentication and make AROControlPlane identityRef independent#79
mzazrivec merged 1 commit intostolostron:backplane-2.17from
marek-veber:ARO-24514-identityRef-optional

Conversation

@marek-veber
Copy link
Copy Markdown
Collaborator

@marek-veber marek-veber commented Feb 22, 2026

Summary

This PR implements ARO-24514 to make identityRef optional in AROControlPlane and AROMachinePool, enabling customers to use ASO credential-based authentication without managing separate AzureClusterIdentity resources.

Additionally, AROMachinePool has been completely refactored to remove Azure SDK dependencies - it now populates providerIDList by reading nodes from the workload cluster instead of calling Azure VM API, making it fully ASO-native and eliminating the need for Azure credentials entirely.

Problem

Currently, AROControlPlane and AROMachinePool require identityRef to initialize Azure credentials. For AROControlPlane, this is needed for Key Vault operations (encryption key creation and version propagation). For AROMachinePool, credentials were required to call Azure VM API to populate providerIDList, even though all other Azure operations are handled by ASO. Customers like Adobe who want to use ASO's serviceoperator.azure.com/credential-from annotations are forced to maintain separate AzureClusterIdentity resources, adding unnecessary complexity.

When using ASO resources with credential annotations, customers want to avoid managing identityRef separately while maintaining full control over Key Vault and encryption key management.

Solution

Core Changes

  1. Skip Azure operations when identityRef is not set

    • AROControlPlane: CAPZ no longer attempts Key Vault operations without Azure credentials
    • AROMachinePool: CAPZ completely removes Azure SDK dependency
    • Customers manually create vaults and keys via ASO
    • Customers manually specify activeKey.version in HcpOpenShiftCluster spec
  2. AROMachinePool refactored to use workload cluster nodes

    • Replaced Azure VM API calls with workload cluster node listing
    • Populates providerIDList from actual nodes instead of Azure VMs
    • Added ClusterTracker for accessing workload cluster clients
    • Follows the same pattern as ASOManagedMachinePool
    • No longer requires Azure credentials at all
  3. Two-layer validation for encryption key version

    • Webhook validation (fail-fast at create/update time)
    • Runtime validation (safety net during reconciliation)
    • Prevents deployment failures with clear, actionable error messages
  4. Enhanced documentation

    • Updated API types with comprehensive descriptions
    • Added authentication modes section to ARO HCP proposal
    • Included examples for both authentication modes

Authentication Modes

Mode identityRef Set identityRef NOT Set
Credentials CAPZ via AzureClusterIdentity ASO via credential annotations
Key Vault Ops Automatic Skipped
Key Creation Automatic Manual (customer)
Key Version Auto-propagated Must be specified manually
Field kms.activeKey.version kms.activeKey.version

Changes Made

Code Changes

  1. Controller Logic (exp/controllers/arocontrolplane_reconciler.go)

    • Refactored encryption key management with clear if-else structure based on identityRef presence
    • Added runtime validation for spec.properties.etcd.dataEncryption.customerManaged.kms.activeKey.version when encryption configured
    • Set EncryptionKeyReadyCondition appropriately for each mode:
      • True when identityRef is set and key is ready
      • Unknown when identityRef is not set (manual key management)
      • False when validation fails
  2. Webhook Validation (exp/api/controlplane/v1beta2/arocontrolplane_webhook.go)

    • Added validateEncryptionKeyVersion() function
    • Enhanced validateResources() to check kms.activeKey.version when identityRef is nil
    • Clear error messages guide customers to specify version at correct path:
      spec.properties.etcd.dataEncryption.customerManaged.kms.activeKey.version
  3. Scope Initialization

    • AROControlPlane Scope (azure/scope/arocontrolplane.go)
      • Enhanced comments explaining credential initialization logic
      • Documented customer responsibilities when identityRef not set
    • AROMachinePool Scope (azure/scope/aromachinepool.go)
      • BREAKING: Completely removed Azure credential dependency
      • No longer requires identityRef or Azure SDK access
      • Removed AzureClients embedding from scope
      • Removed CredentialCache from reconciler
      • AROMachinePool is now fully ASO-native
  4. AROMachinePool Reconciler (exp/controllers/aromachinepool_reconciler.go)

    • Replaced Azure VM API with workload cluster node listing
    • Added ClusterTracker dependency (similar to ASOManagedMachinePool)
    • Removed virtualMachines service and Azure SDK dependencies
    • Populates providerIDList from workload cluster nodes instead of Azure VMs
    • Lists nodes from workload cluster using tracker.GetClient() and filters by node pool name pattern: <cluster-name>-<nodepool-name>-<suffix>
    • More reliable and doesn't require Azure credentials
    • Updated reconciler signature: removed CredentialCache, added ClusterTracker
  5. Main Controller Setup (main.go)

    • Created newWorkloadClusterCache() helper function to prevent duplicate cluster cache creation
    • Added shared workload cluster cache declaration before feature gates
    • Both ASOAPI and ARO feature gates call newWorkloadClusterCache() with nil check
    • Prevents "controller with name clustercache already exists" error when both features enabled
    • Function returns existing cache if already created, creates new one if nil
    • Ensures only one cluster cache is created regardless of feature gate evaluation order
  6. AROMachinePool Tests (exp/controllers/aromachinepool_controller_test.go)

    • Created FakeClusterTracker mock implementing ClusterTracker interface
    • Replaced CredentialCache with ClusterTracker in all test setup
    • Updated reconciler initialization to pass tracker instead of credential cache
    • Added type alias for ClusterTracker interface
    • Tests now validate workload cluster node listing approach
  7. Key Vault Service (azure/services/keyvaults/keyvault.go)

    • Clarified comments for ARO HCP vs legacy behavior
  8. API Types (exp/api/controlplane/v1beta2/arocontrolplane_types.go)

    • Updated identityRef description to explain both authentication modes
    • Clarified when CAPZ performs Key Vault operations vs when customers manage manually

Documentation Changes

  1. ARO HCP Proposal (docs/proposals/20250425-aro-hcp.md)

    • Added "Authentication Modes" section
    • Added "Encryption Key Version Validation" section
    • Enhanced validation and testing documentation
    • Added ASO credential-based example with correct field structure
    • Updated implementation history with AROMachinePool refactoring details
    • Documented workload cluster node listing approach for providerIDList
    • Added section on AROMachinePool Controller pattern changes
  2. CRD Manifest (config/crd/bases/controlplane.cluster.x-k8s.io_arocontrolplanes.yaml)

    • Regenerated with enhanced identityRef description

AROMachinePool Implementation Details

Workload Cluster Node Listing

AROMachinePool now populates providerIDList by reading nodes directly from the workload cluster instead of calling Azure VM API:

  1. Get workload cluster client using ClusterTracker.GetClient(ctx, clusterKey)
  2. List all nodes from the workload cluster: client.List(ctx, &corev1.NodeList{})
  3. Filter by node pool name pattern: <cluster-name>-<nodepool-name>-<random-suffix>
  4. Extract providerID from matching nodes: node.Spec.ProviderID
  5. Update AROMachinePool status with the provider ID list

Benefits

  • No Azure credentials required - Works entirely with Kubernetes API
  • More reliable - Direct source of truth from actual running nodes
  • Consistent with CAPI patterns - Follows ASOManagedMachinePool approach
  • Simpler architecture - No Azure SDK dependency to maintain

Node Pool Name Pattern

ARO HCP node names follow the pattern: {clusterName}-{nodePoolName}-{randomSuffix}

Example:

  • Cluster: my-cluster
  • Node Pool: workers
  • Node names: my-cluster-workers-abc123, my-cluster-workers-def456

The reconciler filters nodes by checking if the node name contains the pattern my-cluster-workers-.

Validation Logic

Webhook Validation (Layer 1)

For each resource in spec.resources:
  IF resource is HcpOpenShiftCluster
    AND ETCD encryption is configured
    AND identityRef is NOT set
  THEN validate that kms.activeKey.version is specified

ERROR: "activeKey.version is required when identityRef is not set -
        CAPZ cannot auto-create or propagate the encryption key without Azure credentials"

Runtime Validation (Layer 2)

During reconciliation:
  IF encryption is configured
    AND identityRef is NOT set
    AND kms.activeKey is missing OR kms.activeKey.version is missing
  THEN
    - Set EncryptionKeyReadyCondition to False (reason: KeyVersionMissing)
    - Return error preventing HcpOpenShiftCluster deployment
    - Log detailed error message

Example Configuration

With identityRef (existing behavior)

spec:
  identityRef:
    kind: AzureClusterIdentity
    name: aro-identity
  resources:
    - kind: HcpOpenShiftCluster
      spec:
        properties:
          etcd:
            dataEncryption:
              customerManaged:
                kms:
                  activeKey:
                    vaultName: my-vault
                    name: etcd-data-kms-encryption-key
                    # version auto-propagated by CAPZ

Without identityRef (new capability)

spec:
  # identityRef: NOT SET - using ASO credentials
  resources:
    - kind: HcpOpenShiftCluster
      metadata:
        annotations:
          serviceoperator.azure.com/credential-from: aso-credential
      spec:
        properties:
          etcd:
            dataEncryption:
              customerManaged:
                kms:
                  activeKey:
                    vaultName: my-vault
                    name: etcd-data-kms-encryption-key
                    version: "abc123def456"  # ✅ REQUIRED - manually specified

Testing

AROControlPlane Tests

  • ✅ Webhook validation rejects HcpOpenShiftCluster without activeKey.version when identityRef is nil
  • ✅ Runtime validation prevents deployment when activeKey.version is missing
  • ✅ HcpOpenShiftCluster successfully creates when activeKey.version is manually specified
  • ✅ Backward compatibility: existing clusters with identityRef work unchanged
  • ✅ EncryptionKeyReadyCondition shows correct status for both modes

AROMachinePool Tests

  • ✅ AROMachinePool reconciles successfully without identityRef
  • ✅ AROMachinePool populates providerIDList from workload cluster nodes (no Azure credentials needed)
  • ✅ Workload cluster node listing filters by node pool name pattern correctly
  • ✅ HcpOpenShiftClustersNodePool created by ASO when identityRef is not set
  • ✅ No Azure SDK calls from AROMachinePool - fully ASO-native
  • ✅ Unit tests pass with FakeClusterTracker mock
  • ✅ Test cases cover: successful reconciliation, paused cluster, paused machine pool, deletion scenarios

Integration Tests

  • ✅ Shared cluster cache creation works correctly when both ASOAPI and ARO features enabled
  • ✅ No "controller with name clustercache already exists" error
  • ✅ Cluster cache is created only once regardless of feature gate order
  • make lint passes
  • ✅ AROMachinePool unit tests pass

Architecture Changes

Before

AROControlPlane        AROMachinePool
      |                      |
      v                      v
 identityRef            identityRef
      |                      |
      v                      v
Azure SDK              Azure SDK
      |                      |
      v                      v
Key Vault API          VM API (providerIDList)

After

AROControlPlane                    AROMachinePool
      |                                  |
      v                                  v
identityRef (optional)           ClusterTracker
      |                                  |
      +------- if set ------+            v
      |                     |      Workload Cluster
      v                     |            |
Azure SDK                   |            v
      |                     |      Node List API
      v                     |            |
Key Vault API               |            v
                            |      Filter by pattern
                            |            |
                            |            v
                            |      providerIDList
                            |
                            +-- if NOT set --+
                                            |
                                            v
                                  ASO handles everything
                                  (customer manages keys)

Breaking Changes

None. This change is backward compatible:

  • Existing deployments with identityRef continue to work unchanged
  • identityRef remains optional (was already optional in schema)
  • Only new validation added for ASO credential-based mode
  • AROMachinePool works with or without identityRef (no Azure credentials needed either way)

Files Changed Summary

File Purpose Key Changes
exp/api/controlplane/v1beta2/arocontrolplane_types.go API types Made identityRef optional
exp/api/controlplane/v1beta2/arocontrolplane_webhook.go Validation Added encryption key version validation
exp/controllers/arocontrolplane_reconciler.go Control plane logic Skip Key Vault ops when identityRef not set
azure/scope/arocontrolplane.go Control plane scope Handle missing identityRef
azure/scope/aromachinepool.go Machine pool scope Removed Azure SDK/credentials
exp/controllers/aromachinepool_reconciler.go Machine pool logic Workload cluster node listing
exp/controllers/aromachinepool_controller.go Controller setup ClusterTracker instead of CredentialCache
exp/controllers/aromachinepool_controller_test.go Tests FakeClusterTracker mock
main.go Controller registration Shared cluster cache
docs/proposals/20250425-aro-hcp.md Documentation Authentication modes, implementation details
config/crd/bases/*.yaml CRD manifests Regenerated with optional identityRef

Related Issues

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Feb 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marek-veber

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marek-veber marek-veber force-pushed the ARO-24514-identityRef-optional branch 3 times, most recently from 04c7bf9 to 5aa47df Compare February 24, 2026 04:16
@marek-veber marek-veber force-pushed the ARO-24514-identityRef-optional branch 2 times, most recently from cfdf1ea to 2c05a35 Compare February 24, 2026 04:49
…hinePool

This change makes spec.identityRef optional for both AROControlPlane and
AROMachinePool to support ASO credential-based authentication, and refactors
AROMachinePool to eliminate the need for Azure SDK credentials entirely.

Changes:

1. AROControlPlane:
   - Made spec.identityRef optional in API types
   - Added webhook validation to ensure identityRef is set when not using ASO
   - Updated CRD with optional identityRef field
   - Updated scope to handle missing identityRef gracefully

2. AROMachinePool:
   - Removed Azure SDK dependencies and credential initialization
   - Removed AzureClients embedding from AROMachinePoolScope
   - Removed CredentialCache parameter from reconciler
   - Replaced Azure VM API calls with workload cluster node listing
   - Added ClusterTracker support for accessing workload cluster clients
   - Updated tests to use FakeClusterTracker mock instead of CredentialCache

3. main.go:
   - Created newWorkloadClusterCache() helper function to share cluster cache
   - Added shared workload cluster cache for both ASOAPI and ARO features
   - Ensures only one cluster cache is created regardless of feature gate order
   - Prevents "controller with name clustercache already exists" error

4. Documentation:
   - Updated ARO HCP proposal with implementation details
   - Documented new node listing approach for AROMachinePool
   - Added implementation history entries

Rationale:
- ASO resources use serviceoperator.azure.com/credential-from annotations
  for authentication, making identityRef redundant when using ASO mode
- AROMachinePool no longer needs Azure credentials as it gets providerIDList
  from workload cluster nodes instead of calling Azure VM API
- This simplifies the authentication model and aligns with ASO patterns
- Follows the same pattern as ASOManagedMachinePool for node discovery
@marek-veber marek-veber force-pushed the ARO-24514-identityRef-optional branch from 2c05a35 to 98893b1 Compare February 24, 2026 08:03
@marek-veber marek-veber changed the title ARO-24514: Make identityRef optional in AROControlPlane for ASO credential-based authentication ARO-24514: Make identityRef optional in AROControlPlane for ASO credential-based authentication and make AROControlPlane identityRef independent Feb 24, 2026
@mzazrivec mzazrivec merged commit c8c350a into stolostron:backplane-2.17 Feb 27, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants