Skip to content

Garbage collect whole target/ #13136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks
epage opened this issue Dec 8, 2023 · 7 comments · May be fixed by #13846
Open
3 tasks

Garbage collect whole target/ #13136

epage opened this issue Dec 8, 2023 · 7 comments · May be fixed by #13846
Assignees
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review Z-gc Nightly: garbage collection Z-script Nightly: cargo script

Comments

@epage
Copy link
Contributor

epage commented Dec 8, 2023

Problem

With "cargo script", the target directory is "hidden" from the user, making it easy to leak when you delete your script.

If we move forward with rust-lang/rfcs#3371, a similar situation will happen for regular packages.

If I haven't touched a project in a long while but have run rustup update, there might be nothing of use left in target/, wasting space.

Sometimes I want to cargo clean all projects on my system (see #11305).

Proposed Solution

We should track in the GC data base a list of

  • Root-manifests (ie the Cargo.toml / cargo script associated with the target directory)
  • Target directory
  • (maybe) The path to the Cargo.lock for future potential work like Pin cache entries still in use #13137 without having to infer the Cargo.lock (special logic needed for cargo-script, feature requests exist for even weirder situations)

Note that neither of the two fields can serve as a unique / primary key. If people use CARGO_TARGET_DIR=/tmp/cargo then multiple workspaces may point to the same target dir. Likewise, people may end up with multiple target dirs for one workspace.

We need to track the Cargo.toml / cargo script because the workspace root is ambiguous when it comes to cargo scripts.

Example entries for CARGO_TARGET_DIR=/tmp/cargo :

id workspace-manifest target dir timestamp
? /foo/Cargo.toml /tmp/cargo ?
? /bar/Cargo.toml /tmp/cargo ?
? /baz/script.rs /tmp/cargo ?

Example entries for rust-analyzer target dir

id workspace-manifest target dir timestamp
? /foo/Cargo.toml /foo/target ?
? /foo/Cargo.toml /foo/target-ra ?
? /bar/Cargo.toml /bar/target ?
? /bar/Cargo.toml /bar/target-ra ?
? /baz/script.rs ~/.cargo/target/... ?

Forms of cleanup

  • Delete target/ if unused for X time (this is in the "locally recreatable" category)
  • Delete all target/ (I just upgraded Rust, maybe rustup could suggest this)
  • Delete leaked target/ (workspace doesn't exist)
    • However, it might be transient (on a thumb drive). Should we make this time based?

Notes

No response

@epage epage added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review Z-script Nightly: cargo script Z-gc Nightly: garbage collection labels Dec 8, 2023
@baby230211
Copy link
Contributor

@rustbot claim

@baby230211
Copy link
Contributor

Three ways for cargo clean script

  1. clean target dir that workspace doesn't used for specific time.
  2. clean target dir that workspace doesn't used anymore.
  3. clean target dir in all workspace.

@epage
Copy link
Contributor Author

epage commented Mar 20, 2024

To clarify, those are use cases for why tracking of whole target/ could be useful.

I'd do a small tweak of wording in case this leads to confusion

  • clean target dir if it and/or its workspace hasn't been used for specific time
  • clean target dir for workspace that is no longer present
  • clean all target dirs

@weihanglo
Copy link
Member

weihanglo commented Mar 27, 2024

@epage, is this garbage collection specific to target directories under ~/.cargo/target that generated by -Zscript? From the issue title I cannot tell.

It's for every target directory.

@epage
Copy link
Contributor Author

epage commented Apr 24, 2024

Currently, we do the batch-save in PackageSet::get_many, so we likely want mark-workspace-used to live in a place before the get_many call

pub fn get_many(&self, ids: impl IntoIterator<Item = PackageId>) -> CargoResult<Vec<&Package>> {

get_many gets called as part of PackageSet::download_accessible

self.get_many(to_download.into_iter())?;

which gets called as part of resolving:

pkg_set.download_accessible(

which gets called as part of create_bcx:

let resolve = ops::resolve_ws_with_opts(

So our options are

  • Put mark-workspace-used in resolve (before download_accessible) so every operation gets it recorded
  • Put mark-workspace-used in create_bcx before the resolve so only compiles have mark-workspace-used recorded

@epage
Copy link
Contributor Author

epage commented Apr 24, 2024

A code path to model off of is

fn mark_used(&self, size: Option<u64>) -> CargoResult<()> {
self.gctx
.deferred_global_last_use()?
.mark_git_checkout_used(global_cache_tracker::GitCheckout {
encoded_git_name: self.ident,
short_name: self.short_id.expect("update before download"),
size,
});
Ok(())
}
}

JeillZhang pushed a commit to JeillZhang/cargo that referenced this issue Mar 17, 2025
### What does this PR try to resolve?

This PR is a follow up on rust-lang#15104 and and adds support for the path
templating in `build.build-dir` as defined in rust-lang#14125.

Supported templates:
* `{workspace-root}`
* `{cargo-cache}` (pointing to `CARGO_HOME` for now)
* `{workspace-manifest-path-hash}`

#### Unresolved questions

What should we name `{workspace-manifest-path-hash}` and what should it
include? Should we shorten to `{workspace-hash}` or even just `{hash}`?
Should we include the Cargo version so we get unique whole-target
directories for easier cleanup (rust-lang#13136)

How should this handle unknown variables (error) or unclosed `{` / `}`
(ignored), see
rust-lang#15236 (comment)

When using `{workspace-manifest-path-hash}` this hash will change based
on the project path. In the event of a cargo being executed in a
symlinked project, the hash will change.

For example, given the following directory
```
/Users/
└─ user1/
    └─ projects/
        ├─ actual-crate/
        │  └─ Cargo.toml
        └─ symlink-to-crate/ -> actual-crate/
```

the hash will be unique when running cargo from each of the following
directories.
* `/Users/user1/actual-crate`
* `/Users/user1/symlink-to-crate`

Figuring out whether to handle this is deferred out, see
- rust-lang#15236 (comment)
-
https://github.com/poliorcetics/rfcs/blob/cargo-target-directories/text/3371-cargo-target-dir-templates.md#symbolic-links
-
rust-lang#12207 (comment)

### How should we test and review this PR?

This PR is fairly small. I included tests for each template variable.

You can also clone my branch and test it locally with
```console
CARGO_BUILD_BUILD_DIR='{workspace-root}/foo' cargo -Z build-dir build
```

### Additional information

While searching Cargo for any prior art for path templating, I found
[`sources/registry/download.rs`](https://github.com/rust-lang/cargo/blob/master/src/cargo/sources/registry/download.rs#L84)
doing a simple string replace. Thus I followed the same behavior.

r? @epage
@epage
Copy link
Contributor Author

epage commented Apr 11, 2025

#14125 adds a build-dir which we will also want to track and clean up.

Once both are stabilized, we can consider changing the workspace-path-hash to also include the Cargo version so that people can clean up caches for old versions of Rust on upgrade.

github-merge-queue bot pushed a commit that referenced this issue Apr 27, 2025
This proposes to stabilize automatic garbage collection of Cargo's
global cache data in the cargo home directory.

### What is being stabilized?

This PR stabilizes automatic garbage collection, which is triggered at
most once per day by default. This automatic gc will delete old, unused
files in cargo's home directory.

It will delete files that need to be downloaded from the network after 3
months, and files that can be generated without network access after 1
month. These thresholds are intended to balance the intent of reducing
cargo's disk usage versus deleting too often forcing cargo to do extra
work when files are missing.

Tracking of the last-use data is stored in a sqlite database in the
cargo home directory. Cargo updates timestamps in that database whenever
it accesses a file in the cache. This part is already stabilized.

This PR also stabilizes the `gc.auto.frequency` configuration option.
The primary use case for when a user may want to set that is to set it
to "never" to disable gc should the need arise to avoid it.

When gc is initiated, and there are files to delete, there will be a
progress bar while it is deleting them. The progress bar will disappear
when it finishes. If the user runs with `-v` verbose option, then cargo
will also display which files it deletes.

If there is an error while cleaning, cargo will only display a warning,
and otherwise continue.

### What is not being stabilized?

The manual garbage collection option (via `cargo clean gc`) is not
proposed to be stabilized at this time. That still needs some design
work. This is tracked in
#13060.

Additionally, there are several low-level config options currently
implemented which define the thresholds for when it will delete files. I
think these options are probably too low-level and specific. This is
tracked in #13061.

Garbage collection of build artifacts is not yet implemented, and
tracked in #13136.

### Background

This feature is tracked in
#12633 and was implemented in a
variety of PRs, primarily #12634.

The tests for this feature are located in
https://github.com/rust-lang/cargo/blob/master/tests/testsuite/global_cache_tracker.rs.

Cargo started tracking the last-use data on stable via
#13492 in 1.78 which was released
2024-05-02. This PR is proposing to stabilize automatic deletion in 1.82
which will be released in 2024-10-17.

### Risks

Users who frequently use versions of Rust older than 1.78 will not have
the last-use data tracking updated. If they infrequently use 1.78 or
newer, and use the same cache files, then the last-use tracking will
only be updated by the newer versions. If that time frame is more than 1
month (or 3 months for downloaded data), then cargo will delete files
that the older versions are still using. This means the next time they
run the older version, it will have to re-download or re-extract the
files.

The effects of deleting cache data in environments where cargo's cache
is modified by external tools is not fully known. For example, CI
caching systems may save and restore cargo's cache. Similarly, things
like Docker images that try to save the cache in a layer, or mount the
cache in a read-only filesystem may have undesirable interactions.

The once-a-day performance hit might be noticeable to some people. I've
been using this for several months, and almost never notice it. However,
slower systems, or situations where there is a lot of data to delete
might take a while (on the order of seconds hopefully).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-accepted Status: Issue or feature is accepted, and has a team member available to help mentor or review Z-gc Nightly: garbage collection Z-script Nightly: cargo script
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants