Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: non-deterministic java apps #278518

Open
44 of 53 tasks
TomaSajt opened this issue Jan 3, 2024 · 6 comments
Open
44 of 53 tasks

Tracking issue: non-deterministic java apps #278518

TomaSajt opened this issue Jan 3, 2024 · 6 comments
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: java Including JDK, tooling, other languages, other VMs 6.topic: reproducible builds

Comments

@TomaSajt
Copy link
Contributor

TomaSajt commented Jan 3, 2024

There are several ways to package a Java app inside Nixpkgs, and most of those package the final app into a .jar file. However, .jar files are not reproducible/deterministic by default.

How to check for reproducibility?

Why are .jar files not deterministic?

When creating a .jar file Java will place a META-INF/MANIFEST.MF file inside the archive which will have the current date as its creation date, so it's non-deterministic.

There also might be a problem with the fact that changing a .properties file during the build process will put the current timestamp inside the file as a comment. These can usually be patched out without too much of a hassle.

One could just download the pre-built .jar-s and not worry about this, but I think it's important to build stuff from source.


Here are some ways java packages/apps are built inside Nixpkgs:

  • using javac+jar by themselves (not too common)
  • using ant (usually using bundled jars for dependencies)
  • using maven (there's a generic builder: maven.buildMavenPackage)
  • using gradle (there's no generic builder for it yet, but some packages have found a way to patch it properly)

Possible solutions to achieve determinism

There was a setup-hook called canonicalize-jars-hook which aimed to solve this problem by unwrapping and rewrapping jars with the creation dates set to a fixed time. However, this process had some flaws

  • it didn't preserve compression-state (whether an entry is just stored or is compressed)
  • it needed weird workarounds for messed up jar files
  • it messed up MANIFEST.MF

A after #296549, canonicalize-jars-hook lives on as stripJavaArchivesHook, using strip-nondeterminism as the backend.

My previous attempts at fixing this:

If we don't want to or can't use stripJavaArchivesHook for some reason, we could use the tools provided by the build systems:

Apache Ant

You can set modificationtime for the <jar> (and possibly <war>) tasks, usually in the build.xml files
Like this: <jar modificationtime="0" ...>...</jar>
To automate this you could use a script like this inside postPatch

# Fix jar timestamps for reproducibility
substituteInPlace build.xml \
    --replace-fail '<jar ' '<jar modificationtime="0" '

(Note, that this also matches a space character after the jar word, so that it doesn't accidentally match other tasks starting with jar, though there's not likely to be one anyway)

You could also use a dedicated xml modification tool like xmlstarlet

# Fix jar timestamps for reproducibility
xmlstarlet ed -L -a "//jar" -t attr -n "modificationtime" -v "0" build.xml

I tried to create setup-hook which does this automatically: #294516
However this would only work for ant projects

Maven

You can set the project.build.outputTimestamp property

You can do this by adding -Dproject.build.outputTimestamp=1980-01-01T00:00:02Z to the args given to the mvn command

or by patching the pom.xml file to have this:

<project>
  ...
  <properties>
    ...
    <project.build.outputTimestamp>1980-01-01T00:00:02Z</project.build.outputTimestamp>
  </properties>
  ...
</project>

Gradle

If the project uses Groovy, you can add the following lines to build.gradle

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

or if the project uses Kotlin you can add the following to build.gradle.kts

tasks.withType<AbstractArchiveTask> {
    isPreserveFileTimestamps = false
    isReproducibleFileOrder = true
}

Without a build system

If the packaging script of an app uses the jar command directly, you could use the --date flag to specify a build date. Note, this flag does not exist on versions before jdk17.

Example:

jar --date="1980-01-01T00:00:02Z" --create Program.class > test.jar

Though I'd say that using stripJavaArchivesHook is your best bet, because it should work with any build tool


Progress with making everything deterministic:

was already deterministic before opening this issue

  • jogl (ant)
  • prismlauncher (cmake's UseJava (javac+jar backend))
  • rstudio (ant)
    • uses ant but doesn't create any jar files
  • swt (javac+jar)

Fixed

attempted, seems difficult

  • dbeaver (maven)
    • many failed attempts to update
    • uses tycho to fetch dependencies
      • puts timestamps into files and mirror-site names into the directory names
      • probably impossible ATM
  • mindustry (gradle)
  • i2p (ant)
    • not deterministic (unstable <servlet> order inside .xml files in the .war files)

could be done

  • gephi (maven)
    • needs default plugin version fix
    • needs fixed outputTimestamp

not attempted

  • buck (ant)
  • more jetbrains stuff? (ant)
  • libreoffice? (ant)
  • polymake (ant)
  • rabbitmq-java-client (ant)
    • uses python2
  • There might be other packages that build jars which I didn't list here
@fgaz
Copy link
Member

fgaz commented Jan 4, 2024

Maybe this hook should be added to the java packaging docs

yes please!

@de11n
Copy link

de11n commented Jan 8, 2024

Excellent work.

TomaSajt added a commit to TomaSajt/nixpkgs that referenced this issue Jan 8, 2024
Changes:
- use `finalAttrs` instead of `rec`
- use `canonicalize-jars-hook` (related issue: NixOS#278518)
- patch `.desktop` files to find icons
- add `meta.mainProgram`
fgaz pushed a commit that referenced this issue Jan 8, 2024
Changes:
- use `finalAttrs` instead of `rec`
- use `canonicalize-jars-hook` (related issue: #278518)
- patch `.desktop` files to find icons
- add `meta.mainProgram`
fgaz pushed a commit to TomaSajt/nixpkgs that referenced this issue Jan 9, 2024
* use finalAttrs instead of rec
* shorten url
* use hash instead of sha256
* use canonicalize-jars-hook (related issue: NixOS#278518)
* add meta.mainProgram
fgaz pushed a commit that referenced this issue Jan 9, 2024
* use finalAttrs instead of rec
* shorten url
* use hash instead of sha256
* use canonicalize-jars-hook (related issue: #278518)
* add meta.mainProgram
@TomaSajt
Copy link
Contributor Author

TomaSajt commented Feb 8, 2024

I was looking around on nixpkgs and it looks like ant actually does have a way to set the modification-time for the created jars. Here's the only example that's inside nixpkgs:

xmlstarlet ed --inplace \
--append //jar --type attr -n modificationtime --value 1980-01-01T00:00Z \
build.xml gluegen-cpptasks-base.xml

It's adding a fixed modificationtime attribute to the jar task.
Docs of the jar ant task: https://ant.apache.org/manual/Tasks/jar.html

After some more looking, it turns out that even the jar program itself has the --date argument.

jar --date="1980-01-01T00:00:02Z" --create Program.class > test.jar

This will create a jar file with all files inside having the set timestamp. Though this is not in the older versions of java. The first version to have this inside nixpkgs is jdk17. Not sure exactly which version added it because it is heavily underdocumented, but I found which commit added it: openjdk/jdk@db68a0c

After some more searching, it looks like gradle also has something like this built in:

tasks.withType(AbstractArchiveTask) {
    preserveFileTimestamps = false
    reproducibleFileOrder = true
}

So, in conclusion, it looks like every major way of packaging has a built-in solution for fixed timestamps. I am a bit sad that I did not look into this more thoroughly earlier.
Still, I believe that having a hook, which works in all cases has its own merits, as it will allow us to avoid having to patch every java app.

@TomaSajt
Copy link
Contributor Author

TomaSajt commented Mar 9, 2024

I decided that using the built-in method is a better solution, so I opened #294516, which moves away from canonicalize-jars-hook

I went with the general solution instead anyways

@de11n
Copy link

de11n commented Mar 11, 2024

It might also be helpful to have a tool to detect common pitfalls in making Java packages deterministic. For example, we could find all jar files in the outputs and inspect them for timestamps that suggest non-determinism. We could then cause that build to fail unless some sort of "allowNondetermism = true" flag is set. This could be a setuphook but at some point we may want a buildJavaPackage builder that just applies common-sense things like this and can be easily documented.

@TomaSajt
Copy link
Contributor Author

TomaSajt commented Mar 21, 2024

A new thing I discovered was the existance of .jmod files and they can also have some non-determinism. They are not just plain .jar files, they are a differnet format.
strip-nondeterminism supports patching these files.
AFAICT they are present inside every jdk since jdk9, but those seem to be mostly deterministic (though I only checked the latest jdk)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems 6.topic: java Including JDK, tooling, other languages, other VMs 6.topic: reproducible builds
Projects
Status: Tracking
Development

No branches or pull requests

5 participants