Skip to content

Re-organize build-dir by package + hash, rather than artifact type #15010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
epage opened this issue Jan 3, 2025 · 8 comments
Open

Re-organize build-dir by package + hash, rather than artifact type #15010

epage opened this issue Jan 3, 2025 · 8 comments
Labels
A-layout Area: target output directory layout, naming, and organization C-cleanup Category: cleanup within the codebase S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@epage
Copy link
Contributor

epage commented Jan 3, 2025

note: I specify build-dir to clarify which half of #14125 I'm referring to. The files and layout of build-dir does not have compatibility guarantees (source).

Currently, build-dir is laid out like

  • target/
    • <target-platform>/?
      • <profile>/
        • incremental/
        • build/
          • <package>-<hash>/
            • build*script-*
        • deps/
          • <package>-<hash>*

Currently,

  • cargo clean -p <package> will operate on everything for <package>

In the future, we could have

These could be aided by re-arranging the build-dir to be organized around <package>-<hash>, like

  • target/
    • <target-platform>/?
      • <profile>/
        • incremental/
        • build/
          • <package>-<hash>/
            • build*script-*
            • *.d

Side effects

  • We'll have to change how we invoke rustc which will increase the length of the command-line
    • Currently, we blindly point rustc at deps/ and rustc finds the files it needs. We'll instead need to point to each individual artifact rustc may need.

Open questions

  • Transition plan: while build-dir isn't stable, enough tools rely on the layout that we'd want to setup a transition plan so they can have time to test against the new layout and work to support both
  • What do we call the directory? I said build/ as its all encompassing
  • Can the old build/ and deps/ content live in the same place?
  • How should we handle incremental/?
  • Can we share across <profile> at least?
    • <hash> is the -C extra-filename hash and doesn't encompass all of fingerprinting, so we'd need to audit if there are cases that don't change the hash that we'd stil need per-profile
    • Changing of local source is one example, so at least local packages still need to be scoped by profile
@epage epage added A-layout Area: target output directory layout, naming, and organization C-cleanup Category: cleanup within the codebase S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. labels Jan 3, 2025
@ranger-ross
Copy link
Contributor

These could be aided by re-arranging the build-dir to be organized around <package>-<hash>, like

  • target/
    • <target-platform>/?
      • <profile>/
        • incremental/
        • build/
          • <package>-<hash>/
            • build*script-*
            • *.d

I am assuming that final binary would still be located at target/<target-platform>/<profile>/<bin-name> (ie. target/debug/foo).
Is that correct?

Transition plan: while build-dir isn't stable, enough tools rely on the layout that we'd want to setup a transition plan so they can have time to test against the new layout and work to support both

I think we should check how many of the most popular tools rely on the layout. Depending the impact we can adjust how aggressive we want to be with the migration.

Regarding strategies, I have 2 ideas.

  1. Simply write the files to the new and old layout directories. This of course with the additional overheads of doubling the disk writes, storage and the complexity of cleaning up multiple directories. But this provides the best backwards compatibility story.
  2. Only write the files to the new layout and create symlinks in the previous layout. This would help mitigate the overhead of the disk writes/storage.

What do we call the directory? I said build/ as its all encompassing

I do not have a strong opinion on this. I was thinking perhaps packages or crates since its a list of the built packages. But I think build also makes sense here too 😄 Maybe build might be better to leave the door open for future possibilities of adding other artifacts to this directory?

@epage
Copy link
Contributor Author

epage commented Jan 3, 2025

I am assuming that final binary would still be located at target/// (ie. target/debug/foo).
Is that correct?

This does not touch final artifacts. I'd recommend reading up on the following note

note: I specify build-dir to clarify which half of #14125 I'm referring to.

I think we should check how many of the most popular tools rely on the layout. Depending the impact we can adjust how aggressive we want to be with the migration.

I know there is at least

There might be some other tools that do weirder stuff, like inspecting debug files or rlibs.

Regarding strategies, I have 2 ideas.

Writing to both or symlinks won't work for the above two tools.

A common approach we take is to have a feature be opt-in and then transition it to opt-out. A question in this is if we'd want to still support the old layout, for which we'd do this through a config, or if we'll only support the new layout, for which we use an env variable and after a sufficient time we remove the opt-out.

@ranger-ross
Copy link
Contributor

ranger-ross commented Jan 18, 2025

Okay, I finally had some time to read up on #14125 some other related threads.

Would it would be better to focus on making progress on separating target-build-dir/target-artifact-dir out as proposed here before attempting re-organize the layout? Doing it all at once would lead to less fragmentation of "build layouts". But adding to the scope would be more work and make it harder to land #14125.

I am leaning towards doing this re-organization after separating the build/artifact dirs.

@epage
Copy link
Contributor Author

epage commented Jan 18, 2025

I've wondered about doing the build-dir change first, like you said. It would make the scope of the change clear and it would help to communicate out what has compatibility guarantees.

github-merge-queue bot pushed a commit that referenced this issue Apr 29, 2025
### What does this PR try to resolve?

While doing some investigation on the theoretical performance
implications of #4282 (and #15010 by extension) I was profiling cargo
with some experimental changes. (Still a work in progress)

But in the mean time, noticed that we do not have spans for rustc
invocations. I think these would be useful when profiling `cargo build`.
(`cargo build --timing` exists but is more geared towards debugging a
slow building project, not cargo itself)

For reference below is an example before/after of a profile run of a
dummy crate with a few random dependencies.

#### Before

![image](https://github.com/user-attachments/assets/710d1b93-133d-4826-9e7a-2deed876dbfa)

#### After

![image](https://github.com/user-attachments/assets/0f0ccad4-82b5-42ad-8762-6bd1dacecab4)
@ranger-ross
Copy link
Contributor

I did some investigating on this. I created a small (and incomplete) prototype on my fork (6a644ff) where the dep-info files are stored in target/<target-platform>/<profile>/build/<package>-<hash>.

One notable side effect of doing this is that rustc command starts to get very large for projects with many dependencies.
This is because as mentioned in the issue description we currently we only add target/<profile>/deps the library search path (-L) in the current implementation.
If we reorganize the deps dir we need to add each library to the rustc lib search path which makes the process command very large if you have many dependencies.

This does not seem like an immediate issue, but I am not sure if there limits on some operating systems that might limit the size of a process command. (I created a synthetic test on my Linux machine and was able to create a rustc process with a 60MB with no issues. I didn't both trying anything larger)

@epage
Copy link
Contributor Author

epage commented Apr 30, 2025

This does not seem like an immediate issue, but I am not sure if there limits on some operating systems that might limit the size of a process command. (I created a synthetic test on my Linux machine and was able to create a rustc process with a 60MB with no issues. I didn't both trying anything larger)

Further down in the layers of calls to rustc, we automatically roll over from CLI args to an argfile.

Some potentially relevant questions

  • How much more frequently are we using argfiles?
  • What is the cost of switching to argfiles?
  • What is the cost of making the argfiles bigger?

Unsure how much we need to answer this in depth.

@weihanglo
Copy link
Member

On my Linux machine the limit is around 2.6MiB

$ getconf ARG_MAX
2621440

Looking at how the thresholds are calculated: https://github.com/rust-lang/rust/blob/d2eadb7a94ef8c9deb5137695df33cd1fc5aee92/compiler/rustc_codegen_ssa/src/back/command.rs#L145-L205. I feel like on Windows it is way easier to hit the limit.

What is the cost of switching to argfiles?

One failed rustc invocation + writing big argfiles + only allow UTF-8 encoding (IIRC).
The good news is that only the last few rustc calls would have such long command line arguments.

@ranger-ross
Copy link
Contributor

The good news is that only the last few rustc calls would have such long command line arguments.

Yes, this is what I observed while testing with my changes.
As we get closer to the root of the dependency graph the larger the command grows.

How much more frequently are we using argfiles?

I think it will depend on the system settings. My machine has an ARG_MAX of 2097152 (2MB)
I did some testing in a dummy project with about ~360 total dependencies and the final rustc invocation clocked in at about 190KB

With some basic extrapolation I would start hitting the arg max on my machine for a project with ~4,000 dependencies.

I think a project of this size is pretty large the overhead of a few argfiles on the last few rustc calls will probably not be noticeable.

However, I am not so sure about Windows. I don't have a windows machine handy but this Stack overflow seems to suggest that its much lower at 2^16 chars (~32KB). But this post was 13 years ago so I am not sure if it has increased since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: target output directory layout, naming, and organization C-cleanup Category: cleanup within the codebase S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

3 participants