-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much faster symlinking, safer interactions with rustc
#386
Conversation
This MR represents a large performance improvement in practical terms in simple Crane situations. For some unscientific numbers, I tested in the following fashion. For each of the two versions of crane (master, this branch), and my simple closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper): 1. I ran `nix flake check` to ensure that any crate downloads were done, any supporting derivations complete. 1. I then added a new crate (anyhow) and ran `nix flake check` again. This times the whole flow after a dependency is added. 1. I then changed a constant and re-ran, simulating the usual flow. Because this project is relatively small, I would expect that this represents a 'worst case' scenario. For example, when uncompressed, it contains 500MB of dependencies, whereas another project I work on represents 3.6GB. The results were as follows: Before this PR: - build with new dep: 2m15s - build with new code: 1m19s After this PR: - build with new dep: 1m22s - build with new code: 32s In addition, it is more robust to crate rebuilds. How it works/why it's better: 1. Drop the diffing behaviour when doing symlinking. This is an explicit tradeoff - if one is doing symlinking on inheritance, we would expect any duplicate data to be in the form of symlinks, for which diffing file content is unhelpful. Given that this only helps the case where we are not symlinking on inheritance, are not archiving on install, it seems reasonable for it to be potentially slower in this case. I say potentially slower since if we have target dirs of 1GB, we are trading 2GB of reads for up to 1GB fewer writes. I'd note here that Nix store optimisation will cover for space savings. But, main argument: common case should be archival or symlinking, and we can boost the performance of the common case by removing this behaviour. 1. Instead, we build a `symlinks.tar` containing symlinks to the outputs of this derivation. 1. When inheriting, instead of traversing the tree and creating symlinks, we just extract this tar. This is great beacuse it means that on both derivation end and derivation start, we avoid forking O(num files produced by cargo build) processes. Since even small projects have thousands of files emitted (my own has 2033 output files). Effectively, GNU tar is much more optimised than the pre-existing bash script. 1. At this point, we still have the problem where rustc may try to write to a file. We use a `RUSTC_WRAPPER` to instead write to a temporary directory, and after the command is finished we copy the artifacts from the out dir back to the target location. There is a potential (small) slowdown caused by this - I observe cargo to use rustc's stderr to kick off new builds as soon as it can, and so had to capture rustc's stdout. However, this effect is most likely very minor.
in | ||
chosenStdenv.mkDerivation (cleanedArgs // { | ||
inherit cargoArtifacts; | ||
|
||
pname = "${args.pname or crateName.pname}${args.pnameSuffix or ""}"; | ||
version = args.version or crateName.version; | ||
|
||
RUSTC_WRAPPER="${rustcWrapper}/bin/rustc-wrapper"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even know crane has functionality like this and I was using my wrapper myself anyway. It seems to me like an unnecessary functionality to expose to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While OK for PoC use, I think a proper wrapper should be handled by crane internally:
- check where the
rustc
is (withwhich rustc
) - set some
CRANE_RUSTC_ORIG_BIN
to it - prepend
PATH
with a path to a wrapper binary - inside the wrapper binary
exec $CRANE_RUSTC_ORIG_BIN $@ ...
done | ||
|
||
if [[ -v current_out_dir ]]; then | ||
stderr="$temp_dir/crane_stderr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother with capturing stderr and stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Cargo uses it to know when a process finished, instead of waiting for it to fully terminate?
:/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: rm -Rf out-dir
before starting original, would take care of this, get rid of an EXIT hook, temp dir altogether. If it worked.
But it just occurred to me that it's out-DIR
and probably oftentimes the same dir is used to write many files, so that probably doesn't fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could use LD_PRELOAD
to intercept open
, write
syscalls in a rustc-wrapper and basically always follow the link if file is being read, and delete the symlink if being written to.
https://osterlund.xyz/posts/2018-03-12-interceptiong-functions-c.html
A bit involved, but could probably work perfectly and be 100% invisible. Though I don't know if it works on MacOS, and it will only work as long as rustc
is using dynamically linked libc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no good ld_preload equivalent on macos which is why i didn't go down that path - and that seems extremely brittle to recompiles. Ideally Nix would let us mount fuse filesystems inside a build - that'd be the 'reliable' solution to this.
I felt this traded off nicely between being robust to change but still getting the job done - like - it depends on public apis of rustc rather than private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't see a better way than the current one.
set -e | ||
stderr_text=$(cat "$stderr") | ||
rm "$stderr" | ||
cp -r "$temp_dir/." "$current_out_dir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cp -a
?
stdout=$("''${args[@]}" 2>"$stderr") | ||
exit_code=$? | ||
set -e | ||
stderr_text=$(cat "$stderr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought - if you know out dir ... wouldn't it be faster & easier to delete it beforehand instead of changing it and copying from it? Technically cargo might want to read something from it first... but something tells me it never actually happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also be a bit worries about these temp paths being written somewhere and then causing rebuilds etc. Instead of making them random, maybe they should be fixed. Eg. always $TMPDIR/rustc-wrapper/${orig-outdir-absolute-path}
. Probably not, but who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I considered this. The problem is that what rustc outputs is dependent on what rustc has been asked to do (tell it to make a dylib? it'll make one!), so it's much more tightly coupled. Whereas the output dir is loosely coupled.
Regarding the temp paths being written somewhere - I logicked myself that this should never happen. In particular, compilers put a lot of energy into being reproducible - it's dodgy if the output is dependent on the path the output is to be written to.
|
||
symlinks="${cargoArtifacts}/symlinks.tar" | ||
if [[ -f "${symlinks}" ]]; then | ||
tar -xf "${symlinks}" -C "${cargoTargetDir}" --strip-components=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would still compress. Probably lots of redundancy in these paths and that point CPU use will pay for itself in saved IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Now that everything is a symlink, but the symlink is invisible to Nix because it's in a tar archive, what is going to prevent Nix from GCing the step being linked to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpc's observation here is correct. Nix will perform a naive scan of the results for any store paths, but will not notice them if they are behind a tarball, meaning a store GC would end up invalidating references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, when compressed, I'd expect it's hard for nix to find what it's looking for, hence the explicit choice to not compress. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work, but I wonder: would compressing + putting 1 symlink somewhere in the output (outside of what is copied to ./target
), keep the GC happy? Aren't all these paths pointing to the same artifact in the nix store (or the thing it points to in a daisy chain)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd definitely work - mostly was going for simplicity vs optimality. A 345MB deps dir on my machine ends up with a 900kB symlinks.tar. Might be more clean to explicitly add a symlink though from the perspective of making the purpose clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as there is a single line somewhere in the output (which can be a separate text file and not necessarily the tarball) which includes /nix/store/...-blah
then Nix will consider that path a dependency of the derivation. So it's okay (as far as the GC is concerned) to create a tarball of symlinks, so long as there is some other symlink or text entry which points to the same "parent" directory that the tarball's symlinks will end up pointing to
I think this approach can work. It's not entirely invisible, and I raised some concerns, but probably it will, at least most of the time.
Would love to see how zst mode fares for comparison. |
@j-baker Ok, so I tried it in fedimint/fedimint#3222 and it fails with:
Test suite here also fails, so something is off somewhere. I'm happy to try again future iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @j-baker thanks for the PR!
My main hesitation with this approach is forcing our own RUSTC_WRAPPER
definition. I'm not sure it's even feasible to try to respect a project's existing RUSTC_WRAPPER
(it can be set both by env var or in the cargo configuration, and I would like to avoid having to to implement cargo's lookups). Even making this opt-in is still tricky having to maintain two different implementations...
It seems like the main performance issue is doing a (single-threaded) file comparison to figure out what to symlink, but this has given me some inspiration that it may be possible to do much more cheaply with a find
invocation which checks for updated timestamps which would be way faster than actually reading out all files!
I'll try to explore that direction sometime soon, but feel free to beat me to it! My hope is that it would give us similar performance benefits with a simpler implementation 😄
@@ -19,5 +19,7 @@ mkCargoDerivation (args // { | |||
|
|||
buildPhaseCargoCommand = "cargoWithProfile clippy ${cargoExtraArgs} ${cargoClippyExtraArgs}"; | |||
|
|||
doInstallCargoArtifacts = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
I don't agree with changing the defaults here because cargoClippy
is intended to be allow chaining with other builds by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed this to be a large slowdown in practice - specifically around copying and patching interpreter paths in the output. My intuition was that folks are using cargoClippy as a linting task, and so like cargoDoc in the last version, it was an oversight which would primarily affect new users.
What would cargoClippy chaining look like for you? Is there an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that cargo clippy should be a a leaf node 99% of the time, if someone has a good reason to run it other way it should work. It technically can produce artifacts (same as cargo check
iirc).
If I read the code here right this change doesn't just change the default, but hardcodes false
.
Probably best as a separate PR with changelog info about change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep if you'd like to propose a different default value here feel free to submit a separate PR for it (then we can at least have a more focused discussion on it!)
in | ||
chosenStdenv.mkDerivation (cleanedArgs // { | ||
inherit cargoArtifacts; | ||
|
||
pname = "${args.pname or crateName.pname}${args.pnameSuffix or ""}"; | ||
version = args.version or crateName.version; | ||
|
||
RUSTC_WRAPPER="${rustcWrapper}/bin/rustc-wrapper"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly opposed to the forced usage of RUSTC_WRAPPER
here. Just because we believe it to be rarely used is not sufficient to outright preclude projects from using their own RUSTC_WRAPPER
.
(not to mention that plenty of builds work just fine without needing this workaround, so it's the exceptional builds which should opt into different behavior than forcing it on everyone)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack - I can handle the wrapper internally (and then also wrap with the user's wrapper). Like, it should be easily possible to wrap twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned that this would be too brittle to maintain at the library level (you can set a rustc-wrapper in .cargo/config.toml
, and those can be littered in different places in the project).
|
||
symlinks="${cargoArtifacts}/symlinks.tar" | ||
if [[ -f "${symlinks}" ]]; then | ||
tar -xf "${symlinks}" -C "${cargoTargetDir}" --strip-components=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpc's observation here is correct. Nix will perform a naive scan of the results for any store paths, but will not notice them if they are behind a tarball, meaning a store GC would end up invalidating references.
@ipetkov for the overall rustc_wrapper problem - my problem is that right now, any time that rustc tries to overwrite a pre-existing symlink, it crashes the build (see the linked issue) - and this problem can easily be triggered by a poorly written crate (e.g. emitting a 'rerun-if-changed' declaration pointing to a path that doesn't exist). The effect is that on approx 1/2 of the projects I've migrated to Crane, I have had to patch an upstream. My sense is that the only fixes that really help this are changes in rustc (now i know more, not cargo) (which will face an uphill battle due to the difficulties of specifying user intentions on the target dir and nicheness of the workflow), a rustc wrapper, or somehow making the symlinks not be symlinks. @dpc's focus is on the latter, but this was an attempt at ensuring that symlinks degrade nicely in the case of bad crates, rather than exploding in your face with a weird error. I can certainly write something which wraps up the user's RUSTC_WRAPPER if provided rather than removing their ability to specify one. Regarding the config.toml setting of this - doesn't crane replace the user's config.toml in the vendorCargoDeps step? That's why I didn't consider that case. Apologies for the failing build - it's failing on Linux tests and while I did run CI locally, only on MacOS. Do you have any ideas? |
No, we never blow that away outright, we simply put our own configuration "higher-up" the hierarchy (basically at the equivalent of
Do you have any (ideally minimal) reproductions I can look at? Ultimately for whatever reason, cargo/rustc is becoming convinced that a file needs to be rebuilt and trying to update it in place, but to me this smells more like a cache invalidation bug and maybe fixing that side steps the issue entirely... |
As for wrapping. It is possible to make a fully invisible and reliable wrapper for any binary, as long you have the path for the original. Example of the wrapper from our codebase: https://github.com/fedimint/fedimint/blob/6c15b0ea133b5239085884c14c334623287d555b/flake.toolchain.nix#L130 . As crane is always in control of the commands that are being executed it can make something even simpler: export RUSTC_ORIGINAL_BINARY="$(which rustc)"
export PATH="/path/to/the/wrapper-dir/:$PATH" where |
@dpc that's not 100% reliable as one could set |
@ipetkov Oh, so it's a rust (mis-)feature? Terrible. I think it would be fine to just not support it. Especially if there are other crane modes which are fine with it. "If you use this feature, you need use a different mode". |
I wouldn't call it a misfeature, it's just not designed for "transparent" pass through wrappers The philosophy of crane is to be a shim between cargo and Nix but make as few impacts on project decisions |
It's a one-off complication for something that can be elegantly solved by a generic solution of PATH lookup order. Probably it's there because without Nix transparently wrapping a binary is actually a bad UX. It's a OK feature, but for someone already using Nix it should be easy to not need it.
I get it, I think it's a worthy goal, but if supporting Alternatively this PR should just implement another mode, with previous mode retained. As I commented in `incremental zstd" PR, I think just maintaining a handful of modes is the way to go. You can default to the "most compatible" one, and let users switch in and out of more specialized ones, depending on their situation (which might be changing as their project evolves). As an existing crane user, I myself want to be able to just switch a mode temporarily if I need pull in a dependency that breaks something, which will require investigation, PRs to crate etc. Obviously too many modes is undesireable, but a handful should be OK, especially if they're different in some minor ways (like optimizations that can break in certain conditions). |
@ipetkov an example of a crate with which I had trouble was microsoft/onnxruntime#17018. Here, because the buildscript unnecessarily did a rerun-if-changed, rustc attempts to recompile. The crate is then unusable with the symlinks strategy, because rustc cannot overwrite a symlink which points inside the Nix store. I have more details on #385. My present situation I haven't bottomed out, but so far as I can tell, adding clippy to the nativeBuildInputs changes the CMAKE_PREFIX_PATH and this causes the As you can see - I eventually got it fixed - but this took 25 days on one crate and is unreleased - in no way unreasonable for them as an OSS project, but quite a bit of time to have much slower builds. Crates can be fixed on a case-by-case basis - but this relies on the crate being maintained, and otherwise is a full blocker. Worse, the error message is inscrutable. I tend to agree with @dpc here. The present strategy is problematic with complex buildscripts. If we have a solution which uses RUSTC_WRAPPER, then it seems relatively straightforward to instruct the user they need to change strategy if they override it. |
I'm symathetic to being stuck on upstreams incorporating and releasing fixes, but ultimately buggy software is buggy and workarounds should be workarounds (IMO) instead of defaults in these situations
I'd still prefer to have this be something projects can opt-into rather than having to opt-out of it. If you can rework this PR so that it's opt-in (preferably by having the user override |
How about a
I'm not sure what do you mean? Would you be fine to include it as a special opt-in mode? |
Motivation
Fixes #385.
This MR represents a large performance improvement in practical terms in simple Crane situations.
For some unscientific numbers, I tested in the following fashion.
For each of the two versions of crane (master, this branch), and my simple closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper):
nix flake check
to ensure that any crate downloads were done, any supporting derivations complete.nix flake check
again. This times the whole flow after a dependency is added.Because this project is relatively small, I would expect that this represents a 'worst case' scenario. For example, when uncompressed, it contains 500MB of dependencies, whereas another project I work on represents 3.6GB.
The results were as follows:
Before this PR:
In addition, it is more robust to crate rebuilds.
How it works/why it's better:
symlinks.tar
containing symlinks to the outputs of this derivation.RUSTC_WRAPPER
to instead write to a temporary directory, and after the command is finished we copy the artifacts from the out dir back to the target location. There is a potential (small) slowdown caused by this - I observe cargo to use rustc's stderr to kick off new builds as soon as it can, and so had to capture rustc's stdout. However, this effect is most likely very minor.Checklist
docs/API.md
(or general documentation) with changesCHANGELOG.md