Make docker images fully reproducible [DI-741]#1236
Make docker images fully reproducible [DI-741]#1236ldziedziul wants to merge 36 commits intomasterfrom
Conversation
hazelcast-enterprise/Dockerfile
Outdated
| @@ -1,3 +1,5 @@ | |||
| # syntax=docker/dockerfile:1.7 | |||
There was a problem hiding this comment.
Does this require buildx? Will we run into the same issues as in #1162?
There was a problem hiding this comment.
it seems it requires buildkit
There was a problem hiding this comment.
it seems it requires buildkit
That's annoying, but not unexpected. I think this time around were in a better situation to progress though.
Now we've nailed down that external users extend out images rather than building their own, which means buildx is an exclusively internal requirement - so ensuring our test environments are up to spec can be in scope as part of this.
And we have convincing justification where it adds real customer value, vs last time where it didn't and wasn't worth pushing.
hazelcast-enterprise/Dockerfile
Outdated
| RUN find /build_root -exec touch -h -d "@$SOURCE_DATE_EPOCH" {} + | ||
|
|
||
| FROM redhat/ubi9-minimal:9.7 | ||
| FROM eclipse-temurin:${JDK_VERSION}-jre-ubi9-minimal |
There was a problem hiding this comment.
This uses a non-constant ubi minor version - i.e. during a scheduled rebuild the version might "jump" from 9.6->9.7. I don't think we have a choice with this implementation, but we've previously approached such changes with trepidation and now it's automatic and non-obvious.
There was a problem hiding this comment.
Yes, we were careful indeed, we should agree on that, but also we should take into consideration that:
- for alpine we simply use alpine:3 and it works finee
- I've never seen an issue with minor ubi upgrade for 4 years
There was a problem hiding this comment.
Have you considered about writing this in something other than bash?
There was a problem hiding this comment.
yes but not here ;)
| SOURCE_REF: | ||
| description: 'The hazelcast-docker branch to verify' | ||
| required: true | ||
| type: string |
There was a problem hiding this comment.
Why do we need to pass this? Isn't it implicit via the github context?
There was a problem hiding this comment.
What do you mean? SOURCE_REF not needed?
There was a problem hiding this comment.
SOURCE_REF allows us to run on one branch (master) using the Dockerfile etc from another branch (e.g. v5.5.9).
But this is only for PR builds - so the SOURCE_REF will always be the ${{ github.ref }} - i.e. PR branch.
So rather than passing it in as a value, we can just skip it and use the implicit/default branch (which will be the PR branch).
There was a problem hiding this comment.
for other workflows we pass it, I want to have a unified approach
| - name: Set up Docker Buildx | ||
| uses: docker/setup-buildx-action@v3 |
There was a problem hiding this comment.
Why not use .github/actions/setup-docker /action.yml? We may have the same disk space issues in this workflow.
There was a problem hiding this comment.
because we build only a single arch image for verifcation (runner arch only). Not sure if we need all architectures to be tested and make it even more complex
There was a problem hiding this comment.
We could make that input optional?
By default buildx-action installs all architectures. Filtering to what we used was just an attempt to improve CI runtime / reduce disk usage.
There was a problem hiding this comment.
The primary bit we should re-use is:
hazelcast-docker/.github/actions/setup-docker/action.yml
Lines 9 to 20 in 2df9055
| - name: Create test distribution | ||
| run: | | ||
| dist_dir=$(mktemp -d) | ||
| mkdir -p "$dist_dir/hazelcast-0.0.0/bin" "$dist_dir/hazelcast-0.0.0/lib" | ||
| printf '#!/bin/bash\n' > "$dist_dir/hazelcast-0.0.0/bin/hz" | ||
| printf '#!/bin/bash\n' > "$dist_dir/hazelcast-0.0.0/bin/hz-healthcheck" | ||
| chmod +x "$dist_dir/hazelcast-0.0.0/bin/"* | ||
| touch "$dist_dir/hazelcast-0.0.0/lib/placeholder" | ||
| (cd "$dist_dir" && zip -qr "$GITHUB_WORKSPACE/${{ matrix.context }}/${{ matrix.dist-file }}" hazelcast-0.0.0/) | ||
| rm -rf "$dist_dir" |
There was a problem hiding this comment.
Could we share this functionality with:
hazelcast-docker/.github/workflows/vulnerability_scan_subworkflow.yml
Lines 37 to 44 in 2df9055
There was a problem hiding this comment.
Should this be abstracted to docker-actions?
|
|
||
| set -o errexit -o nounset -o pipefail ${RUNNER_DEBUG:+-x} | ||
|
|
||
| # Verifies Docker image build reproducibility by building twice and comparing layer digests. |
There was a problem hiding this comment.
nit - standard is to have Usage() ?
There was a problem hiding this comment.
What do you mean?
There was a problem hiding this comment.
something like this
then show usage() if inputs are incorrect - so rather a static comment move it to usage() so can be printed
may be overkill. feel free to skip!
There was a problem hiding this comment.
General comments:
- nicely done - although feels little fragile, but then we have a test for this so we are covered. Hopefully it won't break in the future only to be forced to revert back!
- is it possible to test in
sandboxwith a release?- what I am after is to see
Layer already existswhen rebuilding again. In the PR description you have shown that but is that local or via GH? would be good to see links?
- what I am after is to see
- presume Dockerfiles have been tested locally (Linux/MacOS)?
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Move the inline script that creates a fake Hazelcast distribution ZIP into .github/scripts/fake-zip.functions.sh and source it from both verify-layer-reproducibility and vulnerability_scan_subworkflow workflows.
|
@ldziedziul |
|
Yes, but with this approach you have to remember to set SOURCE_DATE_EPOCH env on the building host and add SOURCE_DATE_EPOCH=1234567890 docker buildx build ... --output rewrite-timestamp=true With my approach you have a self-contained solution, you just build an image as usual |
Bear in mind we only build production images in two places in this repo, and the If there's a Docker-native solution, I would preference that over something homemade. |
it will simplify Dockerfile and makes sense to let Docker handles it
might be the case |



Both OSS and Enterprise Dockerfiles now produce bit-for-bit identical images across clean builds. Every layer hash is deterministic regardless of when or where the build runs.
Fixes https://hazelcast.atlassian.net/browse/DI-741
Benefits
Changes
Reproducible distribution layer (OSS + EE)
/build_root/in a staging stage andCOPY --linkinto the final image, producing a platform-independent layer that is shared across architecturesSOURCE_DATE_EPOCHandfind -depth -newermarker patternsyntax=docker/dockerfile:1.7- enables BuildKit'sCOPY --linksupportEnterprise
eclipse-temurinbase image for EE, removing explicit JDK/package installstruncate -s 0instead ofrm -ffor base-image files (aux-cache,history.sqlite*) to avoid non-deterministic whiteout entry timestampsOSS
apk upgradeinto a separate layer with its own timestamp normalizationSOURCE_DATE_EPOCHfor reproducibility/ /etc /tmp) thatfind -newermissesHow it works
Each RUN step that modifies files follows this pattern:
find / -newer marker -exec touch -d @SOURCE_DATE_EPOCHto clamp all modified file timestampsThis ensures the layer diff contains only deterministic timestamps. The
COPY --linkflag makes the distribution layer independent of the base image, allowing it to be shared across platforms.Verification
EE
Build image (requires local
hazelcast-enterprise/hazelcast-enterprise-distribution-5.6.0.zip)Initial push image to JFrog - all layers need push
Clean local layers
Rebuild image from scratch
Push image to JFrog - all layers already exists
OSS
Build image (requires local
hazelcast-oss/hazelcast-distribution-5.6.0.zip)Initial push image to JFrog - all layers need push
Clean local layers
Rebuild image from scratch
Push image to JFrog - all layers already exists
Automated CI verification
A new reusable workflow (
.github/workflows/verify-layer-reproducibility.yml) runs on every PR viabuild-pr.yml. It builds each Dockerfile twice with--no-cacheand compares all layer digests usingdocker inspect. If any layer differs between builds, the job fails.The verification script (
.github/scripts/verify-layer-reproducibility.sh) can also be run locally:.github/scripts/verify-layer-reproducibility.sh -f hazelcast-oss/Dockerfile -- --build-arg \ HAZELCAST_ZIP_FILE_NAME=hazelcast-distribution-5.6.0.zip hazelcast-oss/