Skip to content

Commit 9a1b092

Browse files
committed
Auto merge of #12634 - ehuss:last-use, r=epage
Add cache garbage collection ### What does this PR try to resolve? This introduces a new garbage collection system which can track the last time files were used in cargo's global cache, and delete old, unused files either automatically or manually. ### How should we test and review this PR? This is broken up into a large number of commits, and each commit should have a short overview of what it does. I am breaking some of these out into separate PRs as well (unfortunately GitHub doesn't really support stacked pull requests). I expect to reduce the size of this PR if those other PRs are accepted. I would first review `unstable.md` to give you an idea of what the user side of this looks like. I would then skim over each commit message to give an overview of all the changes. The core change is the introduction of the `GlobalCacheTracker` which is an interface to a sqlite database which is used for tracking the timestamps. ### Additional information I think the interface for this will almost certainly change over time. This is just a stab to create a starting point where we can start testing and discussing what actual user flags should be exposed. This is also intended to start the process of getting experience using sqlite, and getting some testing in real-world environments to see how things might fail. I'd like to ask for the review to not focus too much on bikeshedding flag names and options. I expect them to change, so this is by no means a concrete proposal for where it will end up. For example, the options are very granular, and I would like to have fewer options. However, it isn't clear how that might best work. The size-tracking options almost certainly need to change, but I do not know exactly what the use cases for size-tracking are, so that will need some discussion with people who are interested in that. I decided to place the gc commands in cargo's `cargo clean` command because I would like to have a single place for users to go for deleting cache artifacts. It may be possible that they get moved to another command, however introducing new subcommands is quite difficult (due to shadowing existing third-party commands). Other options might be `cargo gc`, `cargo maintenance`, `cargo cache`, etc. But there are existing extensions that would interfere with. There are also more directions to go in the future. For example, we could add a `cargo clean info` subcommand which could be used for querying cache information (like the sizes and such). There is also the rest of the steps in the original proposal at https://hackmd.io/U_k79wk7SkCQ8_dJgIXwJg for rolling out sqlite support. See #12633 for the tracking issue
2 parents 6ef771d + 0cd970b commit 9a1b092

37 files changed

+5704
-38
lines changed

Cargo.lock

+85-6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+4
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ pretty_assertions = "1.4.0"
7373
proptest = "1.3.1"
7474
pulldown-cmark = { version = "0.9.3", default-features = false }
7575
rand = "0.8.5"
76+
regex = "1.9.3"
77+
rusqlite = { version = "0.29.0", features = ["bundled"] }
7678
rustfix = "0.6.1"
7779
same-file = "1.0.6"
7880
security-framework = "2.9.2"
@@ -162,6 +164,8 @@ pasetors.workspace = true
162164
pathdiff.workspace = true
163165
pulldown-cmark.workspace = true
164166
rand.workspace = true
167+
regex.workspace = true
168+
rusqlite.workspace = true
165169
rustfix.workspace = true
166170
semver.workspace = true
167171
serde = { workspace = true, features = ["derive"] }

benches/README.md

+36-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,23 @@ cd benches/benchsuite
99
cargo bench
1010
```
1111

12-
The tests involve downloading the index and benchmarking against some
12+
However, running all benchmarks would take many minutes, so in most cases it
13+
is recommended to just run the benchmarks relevant to whatever section of code
14+
you are working on.
15+
16+
## Benchmarks
17+
18+
There are several different kinds of benchmarks in the `benchsuite/benches` directory:
19+
20+
* `global_cache_tracker` — Benchmarks saving data to the global cache tracker
21+
database using samples of real-world data.
22+
* `resolve` — Benchmarks the resolver against simulations of real-world workspaces.
23+
* `workspace_initialization` — Benchmarks initialization of a workspace
24+
against simulations of real-world workspaces.
25+
26+
### Resolve benchmarks
27+
28+
The resolve benchmarks involve downloading the index and benchmarking against some
1329
real-world and artificial workspaces located in the [`workspaces`](workspaces)
1430
directory.
1531

@@ -21,15 +37,32 @@ faster. You can (and probably should) specify individual benchmarks to run to
2137
narrow it down to a more reasonable set, for example:
2238

2339
```sh
24-
cargo bench -- resolve_ws/rust
40+
cargo bench -p benchsuite --bench resolve -- resolve_ws/rust
2541
```
2642

2743
This will only download what's necessary for the rust-lang/rust workspace
2844
(which is about 330MB) and run the benchmarks against it (which should take
2945
about a minute). To get a list of all the benchmarks, run:
3046

3147
```sh
32-
cargo bench -- --list
48+
cargo bench -p benchsuite --bench resolve -- --list
49+
```
50+
51+
### Global cache tracker
52+
53+
The `global_cache_tracker` benchmark tests saving data to the global cache
54+
tracker database using samples of real-world data. This benchmark should run
55+
relatively quickly.
56+
57+
The real-world data is based on a capture of my personal development
58+
environment which has accumulated a large cache. So it is somewhat arbitrary,
59+
but hopefully representative of a challenging environment. Capturing of the
60+
data is done with the `capture-last-use` binary, which you can run if you need
61+
to rebuild the database. Just try to run on a system with a relatively full
62+
cache in your cargo home directory.
63+
64+
```sh
65+
cargo bench -p benchsuite --bench global_cache_tracker
3366
```
3467

3568
## Viewing reports

benches/benchsuite/Cargo.toml

+6
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,10 @@ publish = false
1111

1212
[dependencies]
1313
cargo.workspace = true
14+
cargo-util.workspace = true
1415
criterion.workspace = true
1516
flate2.workspace = true
17+
rand.workspace = true
1618
tar.workspace = true
1719
url.workspace = true
1820

@@ -26,3 +28,7 @@ harness = false
2628
[[bench]]
2729
name = "workspace_initialization"
2830
harness = false
31+
32+
[[bench]]
33+
name = "global_cache_tracker"
34+
harness = false

0 commit comments

Comments
 (0)