Skip to content

Make docker images fully reproducible [DI-741]#1236

Open
ldziedziul wants to merge 36 commits intomasterfrom
reproducible-images-v2
Open

Make docker images fully reproducible [DI-741]#1236
ldziedziul wants to merge 36 commits intomasterfrom
reproducible-images-v2

Conversation

@ldziedziul
Copy link
Contributor

@ldziedziul ldziedziul commented Feb 18, 2026

Both OSS and Enterprise Dockerfiles now produce bit-for-bit identical images across clean builds. Every layer hash is deterministic regardless of when or where the build runs.

Fixes https://hazelcast.atlassian.net/browse/DI-741

Benefits

  • Reproducible builds - identical image hashes across clean builds, regardless of when or where the build runs
  • Reduced storage - JFrog deduplicates identical layers, significantly reducing storage footprint
  • Faster pushes - rebuilds push only changed layers, all unchanged layers show "already exists"
  • Independent layer updates - updating hazelcast binaries, base system/JDK, or system libraries each only affects its own layer; the rest are reused without redundant storage or transfer
  • Cross-platform layer sharing - hazelcast binaries layer is shared across all architectures (amd64, arm64, s390x)

Changes

Reproducible distribution layer (OSS + EE)

  • Extract distribution into /build_root/ in a staging stage and COPY --link into the final image, producing a platform-independent layer that is shared across architectures
  • Normalize timestamps via SOURCE_DATE_EPOCH and find -depth -newer marker pattern
  • syntax=docker/dockerfile:1.7 - enables BuildKit's COPY --link support

Enterprise

  • Migrate to eclipse-temurin base image for EE, removing explicit JDK/package installs
  • Consolidate user creation, package upgrade, and cleanup into a single RUN with timestamp normalization
  • Use truncate -s 0 instead of rm -f for base-image files (aux-cache, history.sqlite*) to avoid non-deterministic whiteout entry timestamps

OSS

  • Add timestamp normalization to the package install step
  • Extract apk upgrade into a separate layer with its own timestamp normalization
  • Regenerate Java cacerts with SOURCE_DATE_EPOCH for reproducibility
  • Explicitly touch overlay2 copy-up directories (/ /etc /tmp) that find -newer misses

How it works

Each RUN step that modifies files follows this pattern:

  1. Create a timestamp marker before making changes
  2. Perform the work (install packages, create users, etc.)
  3. find / -newer marker -exec touch -d @SOURCE_DATE_EPOCH to clamp all modified file timestamps
  4. Clean up the marker

This ensures the layer diff contains only deterministic timestamps. The COPY --link flag makes the distribution layer independent of the base image, allowing it to be shared across platforms.

Verification

EE

Build image (requires local hazelcast-enterprise/hazelcast-enterprise-distribution-5.6.0.zip)

docker buildx build --progress=plain -f hazelcast-enterprise/Dockerfile hazelcast-enterprise --build-arg HAZELCAST_ZIP_FILE_NAME=hazelcast-enterprise-distribution-5.6.0.zip -t hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee:5.6.0-1

Initial push image to JFrog - all layers need push

docker push docker push hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee:5.6.0-1
The push refers to repository [hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee]
5f70bf18a086: Pushed
36a111f6d998: Pushed
989832e1d274: Pushed
b2366c6e0bab: Pushed
d77edb45729d: Pushed
eeeef98956cd: Pushed
b68de875dd7e: Pushed
3c3acb3a7af5: Pushed
4929137c7f9d: Pushed
5.6.0-1: digest: sha256:59fbeb2f8cbe4f3f1781df93d0e1b4e6c6426c63a05f6c7ecb23ed4314df90fb size: 220

Clean local layers

docker system prune -a 

Rebuild image from scratch

docker buildx build --progress=plain -f hazelcast-enterprise/Dockerfile hazelcast-enterprise --build-arg HAZELCAST_ZIP_FILE_NAME=hazelcast-enterprise-distribution-5.6.0.zip -t hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee:5.6.0-2

Push image to JFrog - all layers already exists


docker push hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee:5.6.0-2
The push refers to repository [hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-ee]
5f70bf18a086: Layer already exists
36a111f6d998: Layer already exists
989832e1d274: Layer already exists
b2366c6e0bab: Layer already exists
d77edb45729d: Layer already exists
eeeef98956cd: Layer already exists
b68de875dd7e: Layer already exists
3c3acb3a7af5: Layer already exists
4929137c7f9d: Layer already exists
5.6.0-2: digest: sha256:c18647b6f0e8e385cd2e9766ca36c69a259d27e7589c36d2fd709065728e11d4 size: 2206

OSS

Build image (requires local hazelcast-oss/hazelcast-distribution-5.6.0.zip)

docker buildx build --progress=plain -f hazelcast-oss/Dockerfile hazelcast-oss --build-arg HAZELCAST_ZIP_FILE_NAME=hazelcast-distribution-5.6.0.zip -t hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss:5.6.0-1

Initial push image to JFrog - all layers need push

docker push hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss:5.6.0-1
The push refers to repository [hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss]
5f70bf18a086: Pushed
c99808c7ec85: Pushed
86895bee41d6: Pushed
d1356c4de94c: Pushed
45f3ea5848e8: Pushed
5.6.0: digest: sha256:44c0c2625f1d6417cf290169ce375bb6ef5a0421cc6dd128731c9adc434abd82 size: 1369

Clean local layers

docker system prune -a 

Rebuild image from scratch

docker buildx build --progress=plain -f hazelcast-oss/Dockerfile hazelcast-oss --build-arg HAZELCAST_ZIP_FILE_NAME=hazelcast-distribution-5.6.0.zip -t hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss:5.6.0-2

Push image to JFrog - all layers already exists

docker push hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss:5.6.0-2
The push refers to repository [hazelcast.jfrog.io/sandbox-docker-preprod/lukasz-test-oss]
5f70bf18a086: Layer already exists
c99808c7ec85: Layer already exists
86895bee41d6: Layer already exists
d1356c4de94c: Layer already exists
45f3ea5848e8: Layer already exists
5.6.0-2: digest: sha256:14fd1ab8e1bad71d990d41f87bb2c7195f339199e3eabdd211fc55386bf3b26a size: 1369

Automated CI verification

A new reusable workflow (.github/workflows/verify-layer-reproducibility.yml) runs on every PR via build-pr.yml. It builds each Dockerfile twice with --no-cache and compares all layer digests using docker inspect. If any layer differs between builds, the job fails.

The verification script (.github/scripts/verify-layer-reproducibility.sh) can also be run locally:

.github/scripts/verify-layer-reproducibility.sh -f hazelcast-oss/Dockerfile -- --build-arg \ 
HAZELCAST_ZIP_FILE_NAME=hazelcast-distribution-5.6.0.zip hazelcast-oss/

@ldziedziul ldziedziul self-assigned this Feb 18, 2026
Copy link
Collaborator

@JackPGreen JackPGreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea.
I have some initial comments. I'd like to test the reproducibility locally but haven't yet.

@@ -1,3 +1,5 @@
# syntax=docker/dockerfile:1.7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require buildx? Will we run into the same issues as in #1162?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it requires buildkit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it requires buildkit

That's annoying, but not unexpected. I think this time around were in a better situation to progress though.

Now we've nailed down that external users extend out images rather than building their own, which means buildx is an exclusively internal requirement - so ensuring our test environments are up to spec can be in scope as part of this.

And we have convincing justification where it adds real customer value, vs last time where it didn't and wasn't worth pushing.

RUN find /build_root -exec touch -h -d "@$SOURCE_DATE_EPOCH" {} +

FROM redhat/ubi9-minimal:9.7
FROM eclipse-temurin:${JDK_VERSION}-jre-ubi9-minimal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses a non-constant ubi minor version - i.e. during a scheduled rebuild the version might "jump" from 9.6->9.7. I don't think we have a choice with this implementation, but we've previously approached such changes with trepidation and now it's automatic and non-obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we were careful indeed, we should agree on that, but also we should take into consideration that:

  • for alpine we simply use alpine:3 and it works finee
  • I've never seen an issue with minor ubi upgrade for 4 years

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered about writing this in something other than bash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but not here ;)

Comment on lines +6 to +9
SOURCE_REF:
description: 'The hazelcast-docker branch to verify'
required: true
type: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass this? Isn't it implicit via the github context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? SOURCE_REF not needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SOURCE_REF allows us to run on one branch (master) using the Dockerfile etc from another branch (e.g. v5.5.9).
But this is only for PR builds - so the SOURCE_REF will always be the ${{ github.ref }} - i.e. PR branch.
So rather than passing it in as a value, we can just skip it and use the implicit/default branch (which will be the PR branch).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for other workflows we pass it, I want to have a unified approach

Comment on lines +28 to +29
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use .github/actions/setup-docker /action.yml? We may have the same disk space issues in this workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we build only a single arch image for verifcation (runner arch only). Not sure if we need all architectures to be tested and make it even more complex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make that input optional?
By default buildx-action installs all architectures. Filtering to what we used was just an attempt to improve CI runtime / reduce disk usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary bit we should re-use is:

# Default GitHub runners have very little free disk space causing build failures
# This operation can be intensive, so remove as little as possible to gain useful space
- name: Free Disk Space
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
with:
tool-cache: false
android: false
dotnet: true
haskell: true
large-packages: false
docker-images: false
swap-storage: true

Comment on lines +31 to +40
- name: Create test distribution
run: |
dist_dir=$(mktemp -d)
mkdir -p "$dist_dir/hazelcast-0.0.0/bin" "$dist_dir/hazelcast-0.0.0/lib"
printf '#!/bin/bash\n' > "$dist_dir/hazelcast-0.0.0/bin/hz"
printf '#!/bin/bash\n' > "$dist_dir/hazelcast-0.0.0/bin/hz-healthcheck"
chmod +x "$dist_dir/hazelcast-0.0.0/bin/"*
touch "$dist_dir/hazelcast-0.0.0/lib/placeholder"
(cd "$dist_dir" && zip -qr "$GITHUB_WORKSPACE/${{ matrix.context }}/${{ matrix.dist-file }}" hazelcast-0.0.0/)
rm -rf "$dist_dir"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we share this functionality with:

- name: Generate ${{ matrix.image.label }} dist ZIP
run: |
# Make a dummy empty ZIP file to avoid scanning Java dependencies, as managed downstream
# DI-50 - Remove java artifacts scanning from hazelcast-docker
working_directory=hazelcast-distribution
mkdir -p "${working_directory}/lib"
mkdir -p "${working_directory}/bin"
touch "${working_directory}/bin/empty"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: f8dd562

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be abstracted to docker-actions?


set -o errexit -o nounset -o pipefail ${RUNNER_DEBUG:+-x}

# Verifies Docker image build reproducibility by building twice and comparing layer digests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - standard is to have Usage() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like this
then show usage() if inputs are incorrect - so rather a static comment move it to usage() so can be printed
may be overkill. feel free to skip!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments:

  1. nicely done - although feels little fragile, but then we have a test for this so we are covered. Hopefully it won't break in the future only to be forced to revert back!
  2. is it possible to test in sandbox with a release?
    • what I am after is to see Layer already exists when rebuilding again. In the PR description you have shown that but is that local or via GH? would be good to see links?
  3. presume Dockerfiles have been tested locally (Linux/MacOS)?

ldziedziul and others added 11 commits February 23, 2026 19:10
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Co-authored-by: Jack Green <jack.green@hazelcast.com>
Move the inline script that creates a fake Hazelcast distribution ZIP
into .github/scripts/fake-zip.functions.sh and source it from both
verify-layer-reproducibility and vulnerability_scan_subworkflow workflows.
@nishaatr
Copy link
Contributor

nishaatr commented Feb 26, 2026

@ldziedziul
I stumbled upon this Docker option rewrite-timestamp=true
looks like it will update the timestamp for all files inside the image to SOURCE_DATE_EPOCH just before image is created
not sure here but I think using this means no need to update DT manually in layers. Docker will do it layer by layer
may be you have seen this but thought I mention

@sonarqubecloud
Copy link

@ldziedziul
Copy link
Contributor Author

@ldziedziul I stumbled upon this Docker option rewrite-timestamp=true looks like it will update the timestamp for all files inside the image to SOURCE_DATE_EPOCH just before image is created not sure here but I think using this means no need to update DT manually in layers. Docker will do it layer by layer may be you have seen this but thought I mention

Yes, but with this approach you have to remember to set SOURCE_DATE_EPOCH env on the building host and add --output rewrite-timestamp=true before each build:

SOURCE_DATE_EPOCH=1234567890 docker buildx build ...  --output rewrite-timestamp=true 

With my approach you have a self-contained solution, you just build an image as usual

@JackPGreen
Copy link
Collaborator

@ldziedziul I stumbled upon this Docker option rewrite-timestamp=true looks like it will update the timestamp for all files inside the image to SOURCE_DATE_EPOCH just before image is created not sure here but I think using this means no need to update DT manually in layers. Docker will do it layer by layer may be you have seen this but thought I mention

Yes, but with this approach you have to remember to set SOURCE_DATE_EPOCH env on the building host and add --output rewrite-timestamp=true before each build:

SOURCE_DATE_EPOCH=1234567890 docker buildx build ...  --output rewrite-timestamp=true 

With my approach you have a self-contained solution, you just build an image as usual

Bear in mind we only build production images in two places in this repo, and the build-push-action has support for SOURCE_DATE_EPOCH.

If there's a Docker-native solution, I would preference that over something homemade.

@nishaatr
Copy link
Contributor

nishaatr commented Mar 3, 2026

If there's a Docker-native solution, I would preference that over something homemade.

it will simplify Dockerfile and makes sense to let Docker handles it

SOURCE_DATE_EPOCH=1234567890 docker buildx build ... --output rewrite-timestamp=true

might be the case SOURCE_DATE_EPOCH could be set in Dockerfile and only pass SOURCE_DATE_EPOCH when calling Docker. But don't see a problem to pass SOURCE_DATE_EPOCH as unlikely to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants