Skip to content

Only hash host-independent host-dependency metadata when cross-compiling #15477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

brson
Copy link
Contributor

@brson brson commented May 1, 2025

What does this PR try to resolve?

This is an attempt to improve reproducibility of wasm artifacts (and other cross-targets) when built on different host platforms.

re #8140 (comment)

The problem: when compiling wasm, the -C metadata hash contains metadata from host dependencies, particularly build scripts and proc macros. This metadata is different on Linux and Mac OS, etc, causing symbols in the wasm to differ.

This patch solves this by maintaining a separate host_independent_hash that contains only the most basic info about a unit: name, version, etc. When hashing a "target" unit, if a dependency is a host dependency, it only hashes this simple information, not the complete metadata.

In this way, any host-specific metadata is excluded, while still maintaining some information about host dependencies. So if a proc_macro crate version changes it will result in a different wasm, but if the proc_macro's build profile, etc. changes it will not.

This may resolve #8140, though I am not familiar with the entirety of that issue.

How should we test and review this PR?

A way to think about this change: the exact configuration of build scripts and proc macros shouldn't change how they influence the symbols or filenames of wasm binaries; so it's ok to exclude that metadata from the wasm.

Whether this is true or not IDK.

This patch is very hard to unit test as it requires building on two different hosts and comparing.

I have done a manual test of building all the wasms in the https://github.com/stellar/soroban-examples repo, on both linux and mac, before and after this patch:

https://gist.github.com/brson/3fbfe7ae70c6ffa88f94b546d18bcb93

There are four files there: before-linux.txt, after-linux.txt, before-mac.txt, after-mac.txt. The relevant thing to notice is that both the "after" wasms have the same hashes across platforms, while that is not true for the "before" wasms.

The test was done on soroban-examples 75f1cc676bd80153a75725f80a4181dedf79a092

cc @leighmcculloch @graydon

@rustbot
Copy link
Collaborator

rustbot commented May 1, 2025

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-layout Area: target output directory layout, naming, and organization S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 1, 2025
@epage
Copy link
Contributor

epage commented May 1, 2025

Thanks for working on solving this!

As a heads up, I'm going to mark this as a Draft to remind participants that we prefer to keep discussions on Problems and Solutions in one place, like #8140, while this can serve as an example of a proposed Solution. Once thats done, we can shift to discussing the implementation here.

@epage epage marked this pull request as draft May 1, 2025 20:31
Copy link
Contributor

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to record this while I had it on my mind

@@ -88,9 +89,15 @@ impl fmt::Debug for UnitHash {
/// mangled symbols need to be stable between different builds with different
/// settings. For example, profile-guided optimizations need to swap
/// `RUSTFLAGS` between runs, but needs to keep the same symbol names.
///
/// [`Metadata::host_independent_hash`] is a simple hash that excludes any metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to evaluate how the information at https://doc.rust-lang.org/nightly/nightly-rustc/cargo/core/compiler/fingerprint/index.html#fingerprints-and-unithashs will need to be updated based on this

@leighmcculloch
Copy link

👏🏻 Independent of #8140 this improves reproducibility so it seems like a net benefit to me, but I'm also not familiar with the entirety of the problem to know if the solution will cut across anything more holistic that might be done.

before-linux.txt, after-linux.txt, before-mac.txt, after-mac.txt

Is the intent that other hosts such as windows will also be consistent and are just not tested out of inconvenience, or are the fixes unique to linux and mac hosts?

@brson
Copy link
Contributor Author

brson commented May 1, 2025

Is the intent that other hosts such as windows will also be consistent and are just not tested out of inconvenience, or are the fixes unique to linux and mac hosts?

I expect windows and other hosts to produce the same results as linux and mac. It was just more convenient to test linux/mac here. I can also run the tests on windows if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: target output directory layout, naming, and organization S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-reproducible -C metadata=hash passed to rustc depending on the compiling OS
5 participants