Clean up file hash usage #10102

niloc132 · 2025-02-26T21:19:25Z

GWT currently uses a mix of md5 and sha1, though all use cases are non-cryptographic (and both hash functions are discouraged, and have been since before they were added to GWT). This patch proposes replacing them with Guava's murmur3 128 implementation, as it is slightly faster than md5 (and much faster than sha1), and can still be used to produce the same output format (16 upper-case hex digits).

As part of this refactor, some structured content now has size or field delimiters hashed to avoid collisions, and some streamed content is hashed as it is read to avoid holding the entire payload in memory at once. The Util method dealt with the size delimiters in some cases, but not uniformly, but also ignored exceptions - this patch also cleans up those errors to correctly throw with meaningful errors.

Fixes #10090

Before merging:

Ensure that collisions are never fatal - by the pigeonhole principle collisions are guaranteed to happen eventually, but GWT seems to assume that collisions are effectively impossible. Compiler caching looks safe at a glance, but storing .cache.js etc resources doesn't appear to have any defense against multiple different files having the same hash. The risk might be higher using mumur3 than md5.
Consider putting the hash function(s) back into a single shared class among various parts of the codebase (though not also in a single class that also manages IO, templating, classnames, xml serialization...). I don't love the slightly-different snippets sprinkled throughout the project, but those differences help some use cases to be more efficient.

niloc132

Quick self review, will resolve these and merge conflicts.

user/src/com/google/gwt/resources/rg/ExternalTextResourceGenerator.java

dev/core/src/com/google/gwt/dev/cfg/ModuleDef.java

niloc132

Another pass at self-review complete, I'm removing the "draft". As far as I can tell, the three mentioned changes have the risk of breaking the compiler if an individual user makes a change locally that causes a hash collision. The likelihood for this is very low even for a non-cryptographically secure hash like murmur3 - if we were distributing partly compiled files between developers (like git commits, etc) I might be more conservative with this change.

That said, I'm definitely open to feedback on reverting back to md5 for at least these three changes, and possibly having a single class that picks the hash function for the rest of the compiler, so that a project/team can decide for themselves to pay a little more for compilation in exchange for longer/better hashes.

Besides those three, the full build seems to still pass with collisions in all other calls to produce hashes - that is, I replaced the result in all other places with "1234567890ABCDEF". This might point to insufficient tests - more research here would be a good idea long term.

Regardless of what hashing algorithm is used, I think there is value in this patch in making sure we don't make multiple copies of data just to hash it, but hash as we read/write.

niloc132 · 2025-04-23T20:59:17Z

dev/core/src/com/google/gwt/dev/MinimalRebuildCacheManager.java

-            + permutationDescriptionString
-            + optionsDescriptionString)
-        .getBytes()));
+    String consistentHash = Hashing.murmur3_128().newHasher()


Collisions from this are especially bad for the compiler, and should be revisited (so that collisions don't result in broken code).

niloc132 · 2025-04-23T21:00:04Z

dev/core/src/com/google/gwt/dev/javac/BytecodeSignatureMaker.java

@@ -60,7 +61,7 @@ public CompileDependencyVisitor() {
    }

    public String getSignature() {
-      return Util.computeStrongName(Util.getBytes(getRawString()));
+      return Hashing.murmur3_128().hashString(getRawString(), StandardCharsets.UTF_8).toString();


Collisions in this result in the compiler failing to generated code correctly in a variety of ways - this should be revisited to find a safer way to handle collisions.

niloc132 · 2025-04-30T16:55:22Z

dev/core/src/com/google/gwt/dev/javac/StandardGeneratorContext.java

+      byte[] sourceBytes = source.getBytes(StandardCharsets.UTF_8);
+      strongHash = Hashing.murmur3_128().hashBytes(sourceBytes).toString().toUpperCase(Locale.ROOT);
+      sourceToken = diskCache.writeByteArray(sourceBytes);


Collisions from this are especially bad for the correct output, and should be revisited (so that collisions don't result in broken code).

niloc132 added 11 commits February 1, 2025 20:23

Rewrite byte[][] variant of computeStrongName

ff6da36

Continue to rewrite simple cases, trying for small improvements too

7a237de

hash image that won't use bytes for other purposes

597a6d5

Rest of the non-test usages

878e612

Fix tests too

8879c81

All remaining hash calls that are in scope

0d4ee44

Fix a silly typo

5d7fdaf

squash

f7f2256

Fix a reference to md5

d1c4000

Clean up imports

28b6dbb

Fix comments, remove dead code

334f14f

niloc132 added this to the 2.13 milestone Feb 26, 2025

niloc132 requested a review from vegegoku February 26, 2025 21:19

niloc132 commented Feb 26, 2025

View reviewed changes

user/src/com/google/gwt/resources/rg/ExternalTextResourceGenerator.java Show resolved Hide resolved

dev/core/src/com/google/gwt/dev/cfg/ModuleDef.java Show resolved Hide resolved

niloc132 added 3 commits February 26, 2025 16:23

Use String instead of long (or int) for moduledef hash

ec9b458

Don't compute hash twice, pay a little less for copying data

ce8db41

Merge branch 'main' into 10090-hash-test

30abfda

niloc132 commented Apr 30, 2025

View reviewed changes

niloc132 marked this pull request as ready for review April 30, 2025 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up file hash usage #10102

Clean up file hash usage #10102

niloc132 commented Feb 26, 2025

niloc132 left a comment

niloc132 left a comment

niloc132 Apr 23, 2025

niloc132 Apr 23, 2025

niloc132 Apr 30, 2025

Clean up file hash usage #10102

Are you sure you want to change the base?

Clean up file hash usage #10102

Conversation

niloc132 commented Feb 26, 2025

niloc132 left a comment

Choose a reason for hiding this comment

niloc132 left a comment

Choose a reason for hiding this comment

niloc132 Apr 23, 2025

Choose a reason for hiding this comment

niloc132 Apr 23, 2025

Choose a reason for hiding this comment

niloc132 Apr 30, 2025

Choose a reason for hiding this comment