-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Only hash host-independent host-dependency metadata when cross-compiling #15477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
r? @weihanglo rustbot has assigned @weihanglo. Use |
Thanks for working on solving this! As a heads up, I'm going to mark this as a Draft to remind participants that we prefer to keep discussions on Problems and Solutions in one place, like #8140, while this can serve as an example of a proposed Solution. Once thats done, we can shift to discussing the implementation here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to record this while I had it on my mind
@@ -88,9 +89,15 @@ impl fmt::Debug for UnitHash { | |||
/// mangled symbols need to be stable between different builds with different | |||
/// settings. For example, profile-guided optimizations need to swap | |||
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names. | |||
/// | |||
/// [`Metadata::host_independent_hash`] is a simple hash that excludes any metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to evaluate how the information at https://doc.rust-lang.org/nightly/nightly-rustc/cargo/core/compiler/fingerprint/index.html#fingerprints-and-unithashs will need to be updated based on this
👏🏻 Independent of #8140 this improves reproducibility so it seems like a net benefit to me, but I'm also not familiar with the entirety of the problem to know if the solution will cut across anything more holistic that might be done.
Is the intent that other hosts such as windows will also be consistent and are just not tested out of inconvenience, or are the fixes unique to linux and mac hosts? |
I expect windows and other hosts to produce the same results as linux and mac. It was just more convenient to test linux/mac here. I can also run the tests on windows if desired. |
What does this PR try to resolve?
This is an attempt to improve reproducibility of wasm artifacts (and other cross-targets) when built on different host platforms.
re #8140 (comment)
The problem: when compiling wasm, the
-C metadata
hash contains metadata from host dependencies, particularly build scripts and proc macros. This metadata is different on Linux and Mac OS, etc, causing symbols in the wasm to differ.This patch solves this by maintaining a separate
host_independent_hash
that contains only the most basic info about a unit: name, version, etc. When hashing a "target" unit, if a dependency is a host dependency, it only hashes this simple information, not the complete metadata.In this way, any host-specific metadata is excluded, while still maintaining some information about host dependencies. So if a proc_macro crate version changes it will result in a different wasm, but if the proc_macro's build profile, etc. changes it will not.
This may resolve #8140, though I am not familiar with the entirety of that issue.
How should we test and review this PR?
A way to think about this change: the exact configuration of build scripts and proc macros shouldn't change how they influence the symbols or filenames of wasm binaries; so it's ok to exclude that metadata from the wasm.
Whether this is true or not IDK.
This patch is very hard to unit test as it requires building on two different hosts and comparing.
I have done a manual test of building all the wasms in the https://github.com/stellar/soroban-examples repo, on both linux and mac, before and after this patch:
https://gist.github.com/brson/3fbfe7ae70c6ffa88f94b546d18bcb93
There are four files there: before-linux.txt, after-linux.txt, before-mac.txt, after-mac.txt. The relevant thing to notice is that both the "after" wasms have the same hashes across platforms, while that is not true for the "before" wasms.
The test was done on soroban-examples 75f1cc676bd80153a75725f80a4181dedf79a092
cc @leighmcculloch @graydon