CI: testing AWS confidential VMs #2717

wainersm · 2025-12-10T13:34:12Z

This is the first step towards to run the attestation-aware tests on AWS. It was introduced a new job to run the non-CoCo tests on EKS clusters, using AMD SEV-SNP podvm instances.

In e2e framework there were some code to instantiate EKS but it was broken and CoCo wasn't getting installed on Amazon Linux. I switched most of the code from AWS Go SDK to calling eksctl tool to provision/deprovision clusters as well as migrated to use Ubuntu 24.04 workers (where CoCo installs just fine). Along the way I had to do some fixes and adjustments here and there.

Although we have a clean up mechanism for dangling resources running in our AWS account, I've adapted (and fixed) the script that run after the job fail.

The new job will be running in continue-on-error until it's unstable. Yes, it's still a bit unstable because often the EKS cluster provision fails (need to investigate and fix).

Two executions that I used to test:

Everything went well: https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/20031775323/job/57478524742
Provision failed and clean up script ran: https://github.com/wainersm/cc-cloud-api-adaptor/actions/runs/20076771410/job/57594806839

wainersm · 2025-12-11T14:34:09Z

Updated just to fix golang lint warns.

src/cloud-api-adaptor/test/provisioner/aws/provision_common.go

stevenhorsman · 2025-12-12T15:40:51Z

.github/workflows/e2e_run_all.yaml

          - crio
+        cluster_type:
+          - onprem
        os:
          - ubuntu
        provider:
          - generic
        arch:
          - amd64
+        include:
+          - container_runtime: containerd
+            cluster_type: eks
+            os: ubuntu
+            provider: generic
+            arch: amd64


Could we refactor this into a table for more readability?

Yes, we can!

done. that's what you wanted @stevenhorsman ?

The current support to create EKS relies on using the AWS SDK for Go libraries. This has made the implementation a bit complex but it works, however, in following changes it will be switched to use Ubuntu workers which would require more uneeded code as we could be using a tool like eksctl instead. That's exactly what this commit does: use eksctl to create the EKS cluster. Updated to k8s 1.34 as 1.26 is deprecated. Don't need to handle roles, CNI plug-in, etc, because it all carried out by eksctl. However, still relying on the VPC and subnets already created to run the podvms. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The kata-remote runtimeclass is taking longer than 60 to show up in EKS, so this just increases the timeout. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The changes in the system for Kata/CoCo breaks containerd in Amazon Linux workers. Switched to use Ubuntu 24.0 works. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

It will use eksctl to delete the EKS cluster. The tool should take care the delete nodes groups, cloudformation resources and the cluster itself. And 15 min of timeout should be enough. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Set the podvm_aws_instance_type property to use a instance type other than default (t2.medium). Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

For regions other than us-east-1, it's needed to specify a location constraint otherwise the creation fails. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

For AMD SEV-SNP confidential VMs, it needs to boot UEFI mode. It will automatically opt-in if disablecvm property is false (or empty). Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Just like other resources created (VCP, Subnet..etc) it needs to have an unique name to avoid clash on CI. Also if the eks_name property is passed then it won't attempt to create the cluster, instead assume the cluster was already created and it will be re-used. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

For EKS clusters it creates two subnets but when the provisoner reads from the aws_vpc_subnet_id it assumes it's a single subnet. Overhidden the meaning of aws_vpc_subnet_id to allow passing two subnets separated by comma. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

We want to launch confidencial VM on EKS. If cluste_type is eks then it needs to: * install the eksctl command * tweak the test properties to launch a confidential VM Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Use EKS to test confidential VMs on AWS. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

qemu-img is used to convert the podvm disk from qcow2 to raw to then upload to AWS S3, so it is a requirement. In the current onprem (kcli) job it's installed via src/cloud-api-adaptor/libvirt/config_libvirt.sh as a collateral of installing qemu-kvm. But with EKS job it doesn't run that script so let's install the tool in a workflow step. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The hack/ci-e2e-aws-cleanup.sh is executed to ensure that resources are cleaned up if the test framework exited before running the deprovision code. Adapted the script all delete EKS. The resources were not necessarily created in same region as the credential was, so let's workflow export the region in the AWS_REGION variable and used it in the script. Also notice that EKS is created with two subnets. The property aws_vpc_subnet_id is a comma separated list of subnets then. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

To avoid having to install uneeded libvirt packages in the CI runner. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

In "Config aws" the AWS_REGION is exported, that variable is used in the hack/ci-e2e-aws-cleanup.sh to clean up dangling resources. However, if it has "Configure aws credentials" running afterwards, AWS_REGION is re-set to us-east-1. Let's run "Configure aws credentials" earlier to avoid that problem. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

If the secondary subnet (for EKS) was create then it should be deleted too otherwise the VPC cannot be destroyed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Keep running the new job on CI but ignore the failures until it isn't proved stable. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm requested a review from a team as a code owner December 10, 2025 13:34

wainersm added CI Issues related to CI workflows provider/aws Issues related to AWS CAA provider labels Dec 10, 2025

wainersm force-pushed the ci_aws_coco branch 2 times, most recently from fb0c4ff to 35ad928 Compare December 11, 2025 14:26

stevenhorsman reviewed Dec 12, 2025

View reviewed changes

wainersm added 17 commits December 29, 2025 11:04

test/provision: wait longer for the runtimeclass

4777b60

The kata-remote runtimeclass is taking longer than 60 to show up in EKS, so this just increases the timeout. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provision: use Ubuntu 24.04 works in EKS

7ea8529

The changes in the system for Kata/CoCo breaks containerd in Amazon Linux workers. Switched to use Ubuntu 24.0 works. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provision: allow to set AWS instance type of podvm

a622c7c

Set the podvm_aws_instance_type property to use a instance type other than default (t2.medium). Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provision: support AWS s3 bucket creation in other regions

04703ef

For regions other than us-east-1, it's needed to specify a location constraint otherwise the creation fails. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provision: allow to boot AWS instances with UEFI mode

15780fa

For AMD SEV-SNP confidential VMs, it needs to boot UEFI mode. It will automatically opt-in if disablecvm property is false (or empty). Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

workflows/e2e_aws: add EKS support

d6b0639

We want to launch confidencial VM on EKS. If cluste_type is eks then it needs to: * install the eksctl command * tweak the test properties to launch a confidential VM Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

workflows/e2e_run_all: run tests on AWS EKS

098c9c1

Use EKS to test confidential VMs on AWS. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

hack/ci-e2e-aws-cleanup.sh: build caa-provisioner-cli only for AWS

dc220ff

To avoid having to install uneeded libvirt packages in the CI runner. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provision: proper delete the AWS secondary subnet

3d9913b

If the secondary subnet (for EKS) was create then it should be deleted too otherwise the VPC cannot be destroyed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

workflows/e2e_aws: ignore failures on new confidential job

f46a545

Keep running the new job on CI but ignore the failures until it isn't proved stable. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm force-pushed the ci_aws_coco branch from 35ad928 to f46a545 Compare December 29, 2025 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: testing AWS confidential VMs #2717

CI: testing AWS confidential VMs #2717

wainersm commented Dec 10, 2025

Uh oh!

wainersm commented Dec 11, 2025

Uh oh!

Uh oh!

stevenhorsman Dec 12, 2025

Uh oh!

wainersm Dec 16, 2025

Uh oh!

wainersm Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CI: testing AWS confidential VMs #2717

Are you sure you want to change the base?

CI: testing AWS confidential VMs #2717

Conversation

wainersm commented Dec 10, 2025

Uh oh!

wainersm commented Dec 11, 2025

Uh oh!

Uh oh!

stevenhorsman Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

wainersm Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

wainersm Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants