-
Notifications
You must be signed in to change notification settings - Fork 123
CI: testing AWS confidential VMs #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fb0c4ff to
35ad928
Compare
|
Updated just to fix golang lint warns. |
| - crio | ||
| cluster_type: | ||
| - onprem | ||
| os: | ||
| - ubuntu | ||
| provider: | ||
| - generic | ||
| arch: | ||
| - amd64 | ||
| include: | ||
| - container_runtime: containerd | ||
| cluster_type: eks | ||
| os: ubuntu | ||
| provider: generic | ||
| arch: amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we refactor this into a table for more readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. that's what you wanted @stevenhorsman ?
The current support to create EKS relies on using the AWS SDK for Go libraries. This has made the implementation a bit complex but it works, however, in following changes it will be switched to use Ubuntu workers which would require more uneeded code as we could be using a tool like eksctl instead. That's exactly what this commit does: use eksctl to create the EKS cluster. Updated to k8s 1.34 as 1.26 is deprecated. Don't need to handle roles, CNI plug-in, etc, because it all carried out by eksctl. However, still relying on the VPC and subnets already created to run the podvms. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The kata-remote runtimeclass is taking longer than 60 to show up in EKS, so this just increases the timeout. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The changes in the system for Kata/CoCo breaks containerd in Amazon Linux workers. Switched to use Ubuntu 24.0 works. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
It will use eksctl to delete the EKS cluster. The tool should take care the delete nodes groups, cloudformation resources and the cluster itself. And 15 min of timeout should be enough. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Set the podvm_aws_instance_type property to use a instance type other than default (t2.medium). Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
For regions other than us-east-1, it's needed to specify a location constraint otherwise the creation fails. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
For AMD SEV-SNP confidential VMs, it needs to boot UEFI mode. It will automatically opt-in if disablecvm property is false (or empty). Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Just like other resources created (VCP, Subnet..etc) it needs to have an unique name to avoid clash on CI. Also if the eks_name property is passed then it won't attempt to create the cluster, instead assume the cluster was already created and it will be re-used. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
For EKS clusters it creates two subnets but when the provisoner reads from the aws_vpc_subnet_id it assumes it's a single subnet. Overhidden the meaning of aws_vpc_subnet_id to allow passing two subnets separated by comma. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
We want to launch confidencial VM on EKS. If cluste_type is eks then it needs to: * install the eksctl command * tweak the test properties to launch a confidential VM Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Use EKS to test confidential VMs on AWS. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
qemu-img is used to convert the podvm disk from qcow2 to raw to then upload to AWS S3, so it is a requirement. In the current onprem (kcli) job it's installed via src/cloud-api-adaptor/libvirt/config_libvirt.sh as a collateral of installing qemu-kvm. But with EKS job it doesn't run that script so let's install the tool in a workflow step. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The hack/ci-e2e-aws-cleanup.sh is executed to ensure that resources are cleaned up if the test framework exited before running the deprovision code. Adapted the script all delete EKS. The resources were not necessarily created in same region as the credential was, so let's workflow export the region in the AWS_REGION variable and used it in the script. Also notice that EKS is created with two subnets. The property aws_vpc_subnet_id is a comma separated list of subnets then. Assisted-by: Cursor Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
To avoid having to install uneeded libvirt packages in the CI runner. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
In "Config aws" the AWS_REGION is exported, that variable is used in the hack/ci-e2e-aws-cleanup.sh to clean up dangling resources. However, if it has "Configure aws credentials" running afterwards, AWS_REGION is re-set to us-east-1. Let's run "Configure aws credentials" earlier to avoid that problem. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
If the secondary subnet (for EKS) was create then it should be deleted too otherwise the VPC cannot be destroyed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Keep running the new job on CI but ignore the failures until it isn't proved stable. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
35ad928 to
f46a545
Compare
This is the first step towards to run the attestation-aware tests on AWS. It was introduced a new job to run the non-CoCo tests on EKS clusters, using AMD SEV-SNP podvm instances.
In e2e framework there were some code to instantiate EKS but it was broken and CoCo wasn't getting installed on Amazon Linux. I switched most of the code from AWS Go SDK to calling
eksctltool to provision/deprovision clusters as well as migrated to use Ubuntu 24.04 workers (where CoCo installs just fine). Along the way I had to do some fixes and adjustments here and there.Although we have a clean up mechanism for dangling resources running in our AWS account, I've adapted (and fixed) the script that run after the job fail.
The new job will be running in
continue-on-erroruntil it's unstable. Yes, it's still a bit unstable because often the EKS cluster provision fails (need to investigate and fix).Two executions that I used to test: