Skip to content

Conversation

@vomba
Copy link

@vomba vomba commented Oct 21, 2025

Change description

  • Is this change including a new Provider or a new OS? (y/n) ____
  • If yes, has the Provider/OS matrix been updated in the readme? (y/n) ____
  • If adding a new provider, are you a representative of that provider? (y/n) ____

Related issues

  • Fixes #

Additional context

@vomba vomba changed the title add workflow Automate production of CAPI VM images Oct 27, 2025
Copy link

@elastisys-staffan elastisys-staffan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉 This doesn't actually push or publish the artifact anywhere right?

@vomba
Copy link
Author

vomba commented Oct 28, 2025

Nice work! 🎉 This doesn't actually push or publish the artifact anywhere right?

The openstack part now, azure does, since it creates a VM on there and all (it's the default behaviour).

Copy link

@HaoruiPeng HaoruiPeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!
But I have a question concerning Openstack.
What would the the follow-up steps after the image is created? The workflow stops here, I assume that means the image is gone when the runner is destroyed if you don't upload it anywhere.

Does it make sense to upload the image to the dev projects of all our Openstack infra providers? So we can share the images to production projects from there during maintenance.

Another idea is to have a self-hosted runner that does not get destroyed after the actions finish.

@vomba
Copy link
Author

vomba commented Oct 29, 2025

Good job! But I have a question concerning Openstack. What would the the follow-up steps after the image is created? The workflow stops here, I assume that means the image is gone when the runner is destroyed if you don't upload it anywhere.

Does it make sense to upload the image to the dev projects of all our Openstack infra providers? So we can share the images to production projects from there during maintenance.

Another idea is to have a self-hosted runner that does not get destroyed after the actions finish.

We will need credentials to upload and share the openStack images, that part is on hold until a decision is made about it.

@elastisys-staffan
Copy link

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

@vomba
Copy link
Author

vomba commented Oct 30, 2025

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials.
For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

@elastisys-staffan
Copy link

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials. For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

Would it make sense to use qemu or similar to build the Azure VHD locally and then publish it in a separate job? As I understand it that is how the Openstack flow works, so if we can make it work for Azure too they would harmonize nicely.

@vomba
Copy link
Author

vomba commented Oct 30, 2025

Apologies for not being able to look at this sooner. Do you think we could break the jobs up so that building is one job and publishing is another? The logs are quite long so that would make it easier to follow the process. Also, I think it would add some flexibility.

That can be done for openStack. Once we decide on how to handle the credentials. For Azure, there is no publishing job. as it builds on the platform and stores it there by default.

Would it make sense to use qemu or similar to build the Azure VHD locally and then publish it in a separate job? As I understand it that is how the Openstack flow works, so if we can make it work for Azure too they would harmonize nicely.

I don't know actually, but it feels like adding more work to us.

@elastisys-staffan
Copy link

I don't know actually, but it feels like adding more work to us.

It might be worth the effort actually.

I see several benefits:

  • increased consistency across provider implementations
  • breaking up build and push/publish = separation of concerns
  • separate push step gives us more flexibility to work out the authentication details
  • if the azure image is identical to the openstack image, we might only need to build a single image (that's a big IF, but would be neat)

Imagine we build the images here and push them to object storage or similar. Then, pushing and publishing them can be tied to some other process, like creating a capi release. We could even keep the step manual if we want to.

@Xartos
Copy link

Xartos commented Nov 3, 2025

I don't know actually, but it feels like adding more work to us.

It might be worth the effort actually.

I see several benefits:

* increased consistency across provider implementations

* breaking up build and push/publish = separation of concerns

* separate push step gives us more flexibility to work out the authentication details

* if the azure image is identical to the openstack image, we might only need to build a single image (that's a big IF, but would be neat)

Imagine we build the images here and push them to object storage or similar. Then, pushing and publishing them can be tied to some other process, like creating a capi release. We could even keep the step manual if we want to.

I like this, and we could add it as artifacts of the action. Then some other job could just download the images as the output of this action and push it wherever it needs

sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
- name: install qemu-kvn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT

Suggested change
- name: install qemu-kvn
- name: install qemu-kvm

Comment on lines 43 to 44
path: |
~/.config/packer/plugins
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Any reason this is multiline? Wouldn't this work?

Suggested change
path: |
~/.config/packer/plugins
path: ~/.config/packer/plugins

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly, should be changed. Will look at it.

Comment on lines 54 to 57
run: |
sudo apt update && \
sudo apt upgrade -y && \
sudo apt install -y qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils qemu-utils
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Could we not build an image that has these things already and use that? Would save some time and be less error-prone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check that.

@vomba vomba force-pushed the hani/add-image-build-workflow branch 3 times, most recently from 8c019db to c791d3a Compare November 11, 2025 15:13
Comment on lines +45 to +46
sudo udevadm control --reload-rules
sudo udevadm trigger --name-match=kvm
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Could this also be incorporated into the image? I assume that this is running on the docker_image image now?

Same with other "apt install/pip install" in other workflows as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The building is running on the docker image, but it still needs the host machine to have kvm enabled to work as it is disabled by default on the runners.

the pip install was me trying to get the storing of image work on safespring before i realised its not a openstackclient version problem.

I also avoided modifying the docker_image to keep it as aligned from what is expected in upstream.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so this image is not running on another image. It's the builder that runs on the docker_image image?
If so, shouldn't you be able to build a custom image that already has this set (Like in this example ).

For this particular step it's not as critical since it's fast I guess, but for example this step should be possible to speed up and make more stable if we have a pre-created image with it already installed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opted out of using that example as it caused so much headache when i tried it for the building part.
But i guess for that step it can be done so i can give it a try.

Copy link

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and request a re-review! 😄

@vomba vomba force-pushed the hani/add-image-build-workflow branch from ce5b032 to caa78fa Compare November 26, 2025 10:28
@vomba vomba force-pushed the hani/add-image-build-workflow branch from caa78fa to 2dc7ef9 Compare November 26, 2025 10:55
@vomba vomba requested a review from simonklb November 26, 2025 10:55
Copy link

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebase got some wonky changes. Both a failed conflict resolution and some extra whitespace issues. Please sort them out and re-request a review!

update workflow

add default value to workflow inputs

update workflow

testing default values

test run

update workflow

update

pin ansible version

ansible ver

ansible ver

ansible-core ver

update workflow

update

remove custom role (temporary)

add azure build step

add SP login

update azure envs

fix typo

add cache

add key

testing azure

add gh token

fix cache

update workflow

seperate jobs

update

update

logs

remove cahce from azure

test

update

update

fix typo

add artifact upload

update store step

update path

add store workflow

add input

add install openstckclient

fix

fix command

add image-builder workflow

fix branch name

testing sed

typo

create tag

quotes

echo

fix

add docker login

update openstack to use container

remove checkout

change workflow

update .dockerignore

update workflow

add option

testing

binbash

hostname

test

test

try

deps

testing

testing

typo

test

enable kvm

add logs

env

TEST

test

mount

mount change

mount change

kvm

testing upload artifact

update mount

rw

add user

mdkir

privileged

test

testing

enable azure

add elastx store

update storing

inherit secrets

fix naming

add safespring store

add safespring store

change auth safespring

verbose

change openstack

final

add sshca role

testing

enable image builder

update builder

add sshca role

build new image

run build

add volume

testing

add docker image

add envs

final1
@vomba vomba force-pushed the hani/add-image-build-workflow branch from 2dc7ef9 to ba6d047 Compare November 26, 2025 13:32
@vomba vomba requested a review from simonklb November 26, 2025 13:33
@vomba
Copy link
Author

vomba commented Nov 26, 2025

The rebase got some wonky changes. Both a failed conflict resolution and some extra whitespace issues. Please sort them out and re-request a review!

Should be good now.

Copy link

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth inviting to a walkthrough meeting for this. There are a lot of moving parts. 😅

At least update the PR description explaining how all of this should work and how the flow looks like!

Great job doing all of this! 👍

env:
version: ${{ inputs.version }}
tag: ${{ inputs.tag }}
docker_image: "ghcr.io/elastisys/image-builder-amd64:Automate-production-of-CAPI-VM-images-09c9dac9dc61dc069b72ac55e654cbe1a9190911"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this image never rebuilt or can we really have it hardcoded like this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not, the plan is to make as a variable so we can pick up the last built one from main, as what the workflow for will do on push.
I just didn't get to that yet.

Comment on lines +48 to +53
sed -r \
-e "s/\\\$KUBERNETES_SERIES/${series}/" \
-e "s/\\\$KUBERNETES_VERSION/${version}/" \
-e "s/\\\$KUBERNETES_DEB_VERSION/${package}/" \
-e "s/\\\$IMAGE_TAG/${tag}/" \
<"template.json" >"kubernetes.json"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not envsubst available?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went into copying what we had done in other places.

Comment on lines +64 to +70
docker run -i --rm \
-e PACKER_VAR_FILES -e PACKER_GITHUB_API_TOKEN=${{ secrets.GITHUB_TOKEN }} \
-e SIG_IMAGE_DEFINITION -e SIG_PUBLISHER -e SIG_OFFER -e SIG_SKU \
-e AZURE_SUBSCRIPTION_ID -e AZURE_CLIENT_ID -e AZURE_CLIENT_SECRET -e AZURE_TENANT_ID -e AZURE_LOCATION \
-e RESOURCE_GROUP_NAME -e GALLERY_NAME -e BUILD_RESOURCE_GROUP_NAME \
-v ${{ github.workspace }}/images/capi:/tmp/host \
${{ env.docker_image }} build-azure-sig-ubuntu-2404-gen2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd, why do you have to start the container yourself instead of running it as a workflow job task?

Edit: Found multiple manual docker run executions that I don't understand why they couldn't be normal tasks. Please enlighten me! 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this option where to run the job inside a container by default.
I tried it multiple times but i couldn't figure out why it did not work. So i opted out for the surefire method.

Comment on lines 3 to 7
on:
# push:

workflow_dispatch:
inputs:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup?

Comment on lines 41 to 47
# store-openstack-image-safespring:
# uses: ./.github/workflows/store-openstack-capi-image-safespring.yml
# needs: build-openstack-image
# with:
# version: ${{ inputs.version || '1.33.1' }}
# tag: ${{ inputs.tag || '0.8' }}
# secrets: inherit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup?

Comment on lines 3 to 7
on:
push:
branches:
- main
# pull_request:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup?

Comment on lines +19 to +29
- name: get tag
id: get-tag
run: |
if [ "${{ github.event_name }}" == "pull_request" ]; then
PR_TITLE="${{ github.event.pull_request.title }}"
PR_TAG=$(echo "${PR_TITLE}" | sed -e 's/ /-/g')
echo "TAG=${PR_TAG}-${{ github.sha }}" >> $GITHUB_OUTPUT
else
echo "TAG=${GITHUB_REF##*/}-${{ github.sha }}" >> $GITHUB_OUTPUT
fi
shell: bash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels brittle. This assumes that everone will be good and name their PR correctly and we never change that policy.

Is it not possible to only trigger this on tags being pushed? Then you can also just rely on ${{ github.ref_name }}.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part actually should be removed, as the image is not intended to be build from anything but main.
The file will be cleaned up accordingly.

Comment on lines 35 to 36
username: ${{github.actor}}
password: ${{secrets.GITHUB_TOKEN}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
username: ${{github.actor}}
password: ${{secrets.GITHUB_TOKEN}}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

Comment on lines +53 to +58
sed -r \
-e "s/\\\$KUBERNETES_SERIES/${series}/" \
-e "s/\\\$KUBERNETES_VERSION/${version}/" \
-e "s/\\\$KUBERNETES_DEB_VERSION/${package}/" \
-e "s/\\\$IMAGE_TAG/${tag}/" \
<"template.json" >"kubernetes.json"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use envsubst here as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed it seems to ssh capabilty, copied it form https://github.com/elastisys/ck8s-cluster-api/blob/main/scripts/image/roles/sshca/README.md

Copy link

@elastisys-staffan elastisys-staffan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. The store jobs actually pushes the images straight to the cloud providers right? When we had the discussion meeting, didn't we agree that the images should be built here and stored as artifacts, and handle the distribution/consumption separately for now? Here are the meeting notes for reference: https://docs.google.com/document/d/18inrhGuT2yyaHINowxFbBGj7tYSwafYC2p1vrPy-i3Y/edit?tab=t.736zxsrcd694

Great work on the massive effort you put into this thing so far, and maybe I'm misunderstanding something but I think we need to sort this out before moving forward.

@vomba
Copy link
Author

vomba commented Nov 28, 2025

I'm a little confused. The store jobs actually pushes the images straight to the cloud providers right? When we had the discussion meeting, didn't we agree that the images should be built here and stored as artifacts, and handle the distribution/consumption separately for now? Here are the meeting notes for reference: https://docs.google.com/document/d/18inrhGuT2yyaHINowxFbBGj7tYSwafYC2p1vrPy-i3Y/edit?tab=t.736zxsrcd694

Great work on the massive effort you put into this thing so far, and maybe I'm misunderstanding something but I think we need to sort this out before moving forward.

That is true, i will update it accordingly, got side tracked by other things and forgot about it.

Copy link

@elastisys-staffan elastisys-staffan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is work in progress I'm cool with adding this to main for testing purposes, as long as we keep upstream files unaltered. Regardless, I'm really stoked about this! 😄

COPY --chown=imagebuilder:imagebuilder packer packer/
COPY --chown=imagebuilder:imagebuilder Makefile Makefile
COPY --chown=imagebuilder:imagebuilder azure_targets.sh azure_targets.sh
COPY --chown=imagebuilder:imagebuilder template.json template.json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little hesitant to merge this because Dockerfile and .dockerignore exists upstream and I want to avoid future conflicts if possible. If the files needs altering, maybe we can do it with a git patch or something like that and apply as part of the action?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with the suggested method 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how you do it:

  1. Restore Dockerfile and .dockerignore
  2. Add the changes but don't commit
  3. git diff > patches/dockerfile.patch (lets use a dedicated dir that can be used for future patches too)
  4. To apply the changes when needed: git apply patches/dockerfile.patch

The idea is to commit the patch file which can then be applied in the GH action prior to every build. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants