Automate production of CAPI VM images #13

vomba · 2025-10-21T13:00:25Z

Change description

Is this change including a new Provider or a new OS? (y/n) ____
If yes, has the Provider/OS matrix been updated in the readme? (y/n) ____
If adding a new provider, are you a representative of that provider? (y/n) ____

Related issues

Fixes #

Additional context

elastisys-staffan

Nice work! 🎉 This doesn't actually push or publish the artifact anywhere right?

vomba · 2025-10-28T09:27:53Z

Nice work! 🎉 This doesn't actually push or publish the artifact anywhere right?

The openstack part now, azure does, since it creates a VM on there and all (it's the default behaviour).

HaoruiPeng

Good job!
But I have a question concerning Openstack.
What would the the follow-up steps after the image is created? The workflow stops here, I assume that means the image is gone when the runner is destroyed if you don't upload it anywhere.

Does it make sense to upload the image to the dev projects of all our Openstack infra providers? So we can share the images to production projects from there during maintenance.

Another idea is to have a self-hosted runner that does not get destroyed after the actions finish.

vomba · 2025-10-29T10:01:53Z

Good job! But I have a question concerning Openstack. What would the the follow-up steps after the image is created? The workflow stops here, I assume that means the image is gone when the runner is destroyed if you don't upload it anywhere.

Does it make sense to upload the image to the dev projects of all our Openstack infra providers? So we can share the images to production projects from there during maintenance.

Another idea is to have a self-hosted runner that does not get destroyed after the actions finish.

We will need credentials to upload and share the openStack images, that part is on hold until a decision is made about it.

elastisys-staffan · 2025-10-30T07:45:42Z

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

vomba · 2025-10-30T09:09:22Z

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials.
For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

elastisys-staffan · 2025-10-30T09:59:33Z

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials. For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

Would it make sense to use qemu or similar to build the Azure VHD locally and then publish it in a separate job? As I understand it that is how the Openstack flow works, so if we can make it work for Azure too they would harmonize nicely.

vomba · 2025-10-30T10:27:47Z

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials. For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

Would it make sense to use qemu or similar to build the Azure VHD locally and then publish it in a separate job? As I understand it that is how the Openstack flow works, so if we can make it work for Azure too they would harmonize nicely.

I don't know actually, but it feels like adding more work to us.

elastisys-staffan · 2025-10-30T15:16:41Z

I don't know actually, but it feels like adding more work to us.

It might be worth the effort actually.

I see several benefits:

increased consistency across provider implementations
breaking up build and push/publish = separation of concerns
separate push step gives us more flexibility to work out the authentication details
if the azure image is identical to the openstack image, we might only need to build a single image (that's a big IF, but would be neat)

Imagine we build the images here and push them to object storage or similar. Then, pushing and publishing them can be tied to some other process, like creating a capi release. We could even keep the step manual if we want to.

Xartos · 2025-11-03T07:30:05Z

I don't know actually, but it feels like adding more work to us.

It might be worth the effort actually.

I see several benefits:
* increased consistency across provider implementations

* breaking up build and push/publish = separation of concerns

* separate push step gives us more flexibility to work out the authentication details

* if the azure image is identical to the openstack image, we might only need to build a single image (that's a big IF, but would be neat)
Imagine we build the images here and push them to object storage or similar. Then, pushing and publishing them can be tied to some other process, like creating a capi release. We could even keep the step manual if we want to.

I like this, and we could add it as artifacts of the action. Then some other job could just download the images as the output of this action and push it wherever it needs

Xartos · 2025-11-10T07:45:57Z

.github/workflows/build-openstack-capi-image.yml

+          sudo udevadm control --reload-rules
+          sudo udevadm trigger --name-match=kvm
+
+      - name: install qemu-kvn


NIT

Suggested change

- name: install qemu-kvn

- name: install qemu-kvm

Xartos · 2025-11-10T07:46:57Z

.github/workflows/build-openstack-capi-image.yml

+          path: |
+            ~/.config/packer/plugins


Question: Any reason this is multiline? Wouldn't this work?

Suggested change

path: |

~/.config/packer/plugins

path: ~/.config/packer/plugins

I don't remember exactly, should be changed. Will look at it.

Xartos · 2025-11-10T07:53:17Z

.github/workflows/build-openstack-capi-image.yml

+        run: |
+          sudo apt update && \
+          sudo apt upgrade -y && \
+          sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils qemu-utils


Question: Could we not build an image that has these things already and use that? Would save some time and be less error-prone

Will check that.

Xartos · 2025-11-17T10:07:47Z

.github/workflows/build-openstack-capi-image.yml

+          sudo udevadm control --reload-rules
+          sudo udevadm trigger --name-match=kvm


Question: Could this also be incorporated into the image? I assume that this is running on the docker_image image now?

Same with other "apt install/pip install" in other workflows as well

The building is running on the docker image, but it still needs the host machine to have kvm enabled to work as it is disabled by default on the runners.

the pip install was me trying to get the storing of image work on safespring before i realised its not a openstackclient version problem.

I also avoided modifying the docker_image to keep it as aligned from what is expected in upstream.

Right, so this image is not running on another image. It's the builder that runs on the docker_image image?
If so, shouldn't you be able to build a custom image that already has this set (Like in this example ).

For this particular step it's not as critical since it's fast I guess, but for example this step should be possible to speed up and make more stable if we have a pre-created image with it already installed

I have opted out of using that example as it caused so much headache when i tried it for the building part.
But i guess for that step it can be done so i can give it a try.

simonklb

Please rebase and request a re-review! 😄

simonklb

The rebase got some wonky changes. Both a failed conflict resolution and some extra whitespace issues. Please sort them out and re-request a review!

update workflow add default value to workflow inputs update workflow testing default values test run update workflow update pin ansible version ansible ver ansible ver ansible-core ver update workflow update remove custom role (temporary) add azure build step add SP login update azure envs fix typo add cache add key testing azure add gh token fix cache update workflow seperate jobs update update logs remove cahce from azure test update update fix typo add artifact upload update store step update path add store workflow add input add install openstckclient fix fix command add image-builder workflow fix branch name testing sed typo create tag quotes echo fix add docker login update openstack to use container remove checkout change workflow update .dockerignore update workflow add option testing binbash hostname test test try deps testing testing typo test enable kvm add logs env TEST test mount mount change mount change kvm testing upload artifact update mount rw add user mdkir privileged test testing enable azure add elastx store update storing inherit secrets fix naming add safespring store add safespring store change auth safespring verbose change openstack final add sshca role testing enable image builder update builder add sshca role build new image run build add volume testing add docker image add envs final1

vomba · 2025-11-26T13:33:30Z

The rebase got some wonky changes. Both a failed conflict resolution and some extra whitespace issues. Please sort them out and re-request a review!

Should be good now.

simonklb

Might be worth inviting to a walkthrough meeting for this. There are a lot of moving parts. 😅

At least update the PR description explaining how all of this should work and how the flow looks like!

Great job doing all of this! 👍

simonklb · 2025-11-27T10:11:35Z

.github/workflows/build-azure-capi-image.yml

+env:
+  version: ${{ inputs.version }}
+  tag: ${{ inputs.tag }}
+  docker_image: "ghcr.io/elastisys/image-builder-amd64:Automate-production-of-CAPI-VM-images-09c9dac9dc61dc069b72ac55e654cbe1a9190911"


Is this image never rebuilt or can we really have it hardcoded like this?

It does not, the plan is to make as a variable so we can pick up the last built one from main, as what the workflow for will do on push.
I just didn't get to that yet.

simonklb · 2025-11-27T10:12:08Z

.github/workflows/build-azure-capi-image.yml

+          sed -r \
+            -e "s/\\\$KUBERNETES_SERIES/${series}/" \
+            -e "s/\\\$KUBERNETES_VERSION/${version}/" \
+            -e "s/\\\$KUBERNETES_DEB_VERSION/${package}/" \
+            -e "s/\\\$IMAGE_TAG/${tag}/" \
+            <"template.json" >"kubernetes.json"


Is not envsubst available?

I just went into copying what we had done in other places.

simonklb · 2025-11-27T10:13:18Z

.github/workflows/build-azure-capi-image.yml

+          docker run -i --rm \
+          -e PACKER_VAR_FILES -e PACKER_GITHUB_API_TOKEN=${{ secrets.GITHUB_TOKEN }} \
+          -e SIG_IMAGE_DEFINITION -e SIG_PUBLISHER -e SIG_OFFER -e SIG_SKU \
+          -e AZURE_SUBSCRIPTION_ID -e AZURE_CLIENT_ID -e AZURE_CLIENT_SECRET -e AZURE_TENANT_ID -e AZURE_LOCATION \
+          -e RESOURCE_GROUP_NAME -e GALLERY_NAME -e BUILD_RESOURCE_GROUP_NAME \
+          -v ${{ github.workspace }}/images/capi:/tmp/host \
+          ${{ env.docker_image }} build-azure-sig-ubuntu-2404-gen2


This looks odd, why do you have to start the container yourself instead of running it as a workflow job task?

Edit: Found multiple manual docker run executions that I don't understand why they couldn't be normal tasks. Please enlighten me! 😄

There is this option where to run the job inside a container by default.
I tried it multiple times but i couldn't figure out why it did not work. So i opted out for the surefire method.

simonklb · 2025-11-27T10:13:35Z

.github/workflows/build-capi-vm-images.yml

+on:
+  # push:
+
+  workflow_dispatch:
+    inputs:


simonklb · 2025-11-27T10:13:45Z

.github/workflows/build-capi-vm-images.yml

+  # store-openstack-image-safespring:
+  #   uses: ./.github/workflows/store-openstack-capi-image-safespring.yml
+  #   needs: build-openstack-image
+  #   with:
+  #     version: ${{ inputs.version || '1.33.1' }}
+  #     tag: ${{ inputs.tag || '0.8' }}
+  #   secrets: inherit


simonklb · 2025-11-27T10:13:59Z

.github/workflows/build-image-builder.yml

+on:
+  push:
+    branches:
+      - main
+  # pull_request:


simonklb · 2025-11-27T10:18:43Z

.github/workflows/build-image-builder.yml

+      - name: get tag
+        id: get-tag
+        run: |
+          if [ "${{ github.event_name }}" == "pull_request" ]; then
+            PR_TITLE="${{ github.event.pull_request.title }}"
+            PR_TAG=$(echo "${PR_TITLE}" | sed -e 's/ /-/g')
+            echo "TAG=${PR_TAG}-${{ github.sha }}" >> $GITHUB_OUTPUT
+          else
+            echo "TAG=${GITHUB_REF##*/}-${{ github.sha }}" >> $GITHUB_OUTPUT
+          fi
+        shell: bash


This feels brittle. This assumes that everone will be good and name their PR correctly and we never change that policy.

Is it not possible to only trigger this on tags being pushed? Then you can also just rely on ${{ github.ref_name }}.

That part actually should be removed, as the image is not intended to be build from anything but main.
The file will be cleaned up accordingly.

simonklb · 2025-11-27T10:19:09Z

.github/workflows/build-image-builder.yml

+          username: ${{github.actor}}
+          password: ${{secrets.GITHUB_TOKEN}}


nit

Suggested change

username: ${{github.actor}}

password: ${{secrets.GITHUB_TOKEN}}

username: ${{ github.actor }}

password: ${{ secrets.GITHUB_TOKEN }}

simonklb · 2025-11-27T10:20:32Z

.github/workflows/build-openstack-capi-image.yml

+          sed -r \
+            -e "s/\\\$KUBERNETES_SERIES/${series}/" \
+            -e "s/\\\$KUBERNETES_VERSION/${version}/" \
+            -e "s/\\\$KUBERNETES_DEB_VERSION/${package}/" \
+            -e "s/\\\$IMAGE_TAG/${tag}/" \
+            <"template.json" >"kubernetes.json"


Should use envsubst here as well

simonklb · 2025-11-27T10:23:30Z

images/capi/ansible/roles/sshca/files/ssh_ca.pub

Empty file?

Yes, it is needed it seems to ssh capabilty, copied it form https://github.com/elastisys/ck8s-cluster-api/blob/main/scripts/image/roles/sshca/README.md

elastisys-staffan

I'm a little confused. The store jobs actually pushes the images straight to the cloud providers right? When we had the discussion meeting, didn't we agree that the images should be built here and stored as artifacts, and handle the distribution/consumption separately for now? Here are the meeting notes for reference: https://docs.google.com/document/d/18inrhGuT2yyaHINowxFbBGj7tYSwafYC2p1vrPy-i3Y/edit?tab=t.736zxsrcd694

Great work on the massive effort you put into this thing so far, and maybe I'm misunderstanding something but I think we need to sort this out before moving forward.

vomba · 2025-11-28T15:23:15Z

I'm a little confused. The store jobs actually pushes the images straight to the cloud providers right? When we had the discussion meeting, didn't we agree that the images should be built here and stored as artifacts, and handle the distribution/consumption separately for now? Here are the meeting notes for reference: https://docs.google.com/document/d/18inrhGuT2yyaHINowxFbBGj7tYSwafYC2p1vrPy-i3Y/edit?tab=t.736zxsrcd694

Great work on the massive effort you put into this thing so far, and maybe I'm misunderstanding something but I think we need to sort this out before moving forward.

That is true, i will update it accordingly, got side tracked by other things and forgot about it.

elastisys-staffan

Even if it is work in progress I'm cool with adding this to main for testing purposes, as long as we keep upstream files unaltered. Regardless, I'm really stoked about this! 😄

elastisys-staffan · 2025-12-02T09:25:36Z

images/capi/Dockerfile

 COPY --chown=imagebuilder:imagebuilder packer packer/
 COPY --chown=imagebuilder:imagebuilder Makefile Makefile
 COPY --chown=imagebuilder:imagebuilder azure_targets.sh azure_targets.sh
+COPY --chown=imagebuilder:imagebuilder template.json template.json


I'm a little hesitant to merge this because Dockerfile and .dockerignore exists upstream and I want to avoid future conflicts if possible. If the files needs altering, maybe we can do it with a git patch or something like that and apply as part of the action?

I am not very familiar with the suggested method 😅

Here's how you do it:

Restore Dockerfile and .dockerignore

Add the changes but don't commit

git diff > patches/dockerfile.patch (lets use a dedicated dir that can be used for future patches too)

To apply the changes when needed: git apply patches/dockerfile.patch

The idea is to commit the patch file which can then be applied in the GH action prior to every build. WDYT?

vomba changed the title ~~add workflow~~ Automate production of CAPI VM images Oct 27, 2025

vomba requested review from HaoruiPeng, Xartos, davidumea, elastisys-staffan and lunkan93 October 27, 2025 10:42

elastisys-staffan reviewed Oct 28, 2025

View reviewed changes

HaoruiPeng reviewed Oct 29, 2025

View reviewed changes

Xartos reviewed Nov 10, 2025

View reviewed changes

vomba force-pushed the hani/add-image-build-workflow branch 3 times, most recently from 8c019db to c791d3a Compare November 11, 2025 15:13

vomba requested review from Xartos and elastisys-staffan November 17, 2025 10:00

Xartos reviewed Nov 17, 2025

View reviewed changes

vomba requested review from Xartos, Zash, simonklb and viktor-f November 19, 2025 12:38

Xartos force-pushed the main branch from 05b7129 to 291a07d Compare November 25, 2025 08:25

simonklb reviewed Nov 25, 2025

View reviewed changes

vomba force-pushed the hani/add-image-build-workflow branch from ce5b032 to caa78fa Compare November 26, 2025 10:28

vomba force-pushed the hani/add-image-build-workflow branch from caa78fa to 2dc7ef9 Compare November 26, 2025 10:55

vomba requested a review from simonklb November 26, 2025 10:55

simonklb requested changes Nov 26, 2025

View reviewed changes

vomba force-pushed the hani/add-image-build-workflow branch from 2dc7ef9 to ba6d047 Compare November 26, 2025 13:32

vomba requested a review from simonklb November 26, 2025 13:33

simonklb requested changes Nov 27, 2025

View reviewed changes

elastisys-staffan reviewed Nov 27, 2025

View reviewed changes

cleanup

97705f7

vomba requested review from elastisys-staffan and simonklb December 1, 2025 07:55

elastisys-staffan reviewed Dec 2, 2025

View reviewed changes

	path: \|
	~/.config/packer/plugins
	path: ~/.config/packer/plugins

		sudo udevadm control --reload-rules
		sudo udevadm trigger --name-match=kvm

		username: ${{github.actor}}
		password: ${{secrets.GITHUB_TOKEN}}

Automate production of CAPI VM images #13

Are you sure you want to change the base?

Automate production of CAPI VM images #13

Uh oh!

Conversation

vomba commented Oct 21, 2025

Change description

Related issues

Additional context

Uh oh!

elastisys-staffan left a comment

Choose a reason for hiding this comment

Uh oh!

vomba commented Oct 28, 2025

Uh oh!

HaoruiPeng left a comment

Choose a reason for hiding this comment

Uh oh!

vomba commented Oct 29, 2025

Uh oh!

elastisys-staffan commented Oct 30, 2025

Uh oh!

vomba commented Oct 30, 2025

Uh oh!

elastisys-staffan commented Oct 30, 2025

Uh oh!

vomba commented Oct 30, 2025

Uh oh!

elastisys-staffan commented Oct 30, 2025

Uh oh!

Xartos commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonklb left a comment

Choose a reason for hiding this comment

Uh oh!

simonklb left a comment

Choose a reason for hiding this comment

Uh oh!

vomba commented Nov 26, 2025

Uh oh!

simonklb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment