Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel tree diffing early when matching path is found #1747

Merged
merged 5 commits into from
Jan 7, 2025

Conversation

cruessler
Copy link
Contributor

This PR is ready for review, but it still requires a couple of iterations before it can be considered ready to be merged.

It introduces a custom Recorder that is mainly a slimmed down version of gix_diff::tree::Recorder. Recorder shortens diffing by cancelling it as soon as a change to interesting_path is found. This is based on the assumption that there will be at most one change per file path.

The results seem promising. Compared to the existing implementation, I was able to observe a speedup of about 5 % for gix blame STABILITY.md.

gitoxide on  main [$?] is 📦 v0.40.0 via 🦀 v1.83.0 took 9s
❯ hyperfine "$HOME/github/Byron/gitoxide/target/release/gix blame STABILITY.md" "$HOME/worktrees/gitoxide/branch-3/target/release/gix blame STABILITY.md"
Benchmark 1: /home/christoph/github/Byron/gitoxide/target/release/gix blame STABILITY.md
  Time (mean ± σ):     507.0 ms ±   7.6 ms    [User: 485.4 ms, System: 19.5 ms]
  Range (min … max):   498.6 ms … 520.3 ms    10 runs

Benchmark 2: /home/christoph/worktrees/gitoxide/branch-3/target/release/gix blame STABILITY.md
  Time (mean ± σ):     483.0 ms ±   4.7 ms    [User: 458.3 ms, System: 22.4 ms]
  Range (min … max):   475.7 ms … 491.5 ms    10 runs

Summary
  '/home/christoph/worktrees/gitoxide/branch-3/target/release/gix blame STABILITY.md' ran
    1.05 ± 0.02 times faster than '/home/christoph/github/Byron/gitoxide/target/release/gix blame STABILITY.md'

For gix blame Cargo.lock, the speedup was in the range of 25 % to 30 %.

gitoxide on  main [$?] is 📦 v0.40.0 via 🦀 v1.83.0 took 9s
❯ hyperfine "$HOME/github/Byron/gitoxide/target/release/gix blame Cargo.lock" "$HOME/worktrees/gitoxide/branch-3/target/release/gix blame Cargo.lock"
Benchmark 1: /home/christoph/github/Byron/gitoxide/target/release/gix blame Cargo.lock
  Time (mean ± σ):      2.579 s ±  0.009 s    [User: 2.374 s, System: 0.197 s]
  Range (min … max):    2.568 s …  2.593 s    10 runs

Benchmark 2: /home/christoph/worktrees/gitoxide/branch-3/target/release/gix blame Cargo.lock
  Time (mean ± σ):      2.022 s ±  0.007 s    [User: 1.821 s, System: 0.192 s]
  Range (min … max):    2.010 s …  2.034 s    10 runs

Summary
  '/home/christoph/worktrees/gitoxide/branch-3/target/release/gix blame Cargo.lock' ran
    1.28 ± 0.01 times faster than '/home/christoph/github/Byron/gitoxide/target/release/gix blame Cargo.lock'

I also ran gix blame on all files in gitoxide to make sure the changed version’s output exactly matched the current version’s output.

Open questions

  • Can we ignore gix_diff::tree’s return value or is this a bad idea?
  • Where does Recorder go and can we find a better name for this struct?
  • Would it make sense to rename stats.trees_diffed to stats.trees_partially_diffed?

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, that's great!

Who would have thought that such a small change can have such large impact! It's also something I didn't even think off, so I am really glad you did!

Would it make sense to rename stats.trees_diffed to stats.trees_partially_diffed?

I'd keep the name simple as it's typically Debug printed anyway. But it's probably fair to be specific in its doc-string to indicate the diff is usually partial.

// `recorder` cancels the traversal by returning `Cancel` when a change to `file_path` is
// found. `gix_diff::tree` converts `Cancel` into `Err(...)` which is why we ignore its return
// value here. I don’t know whether this has the potential to hide bugs.
let _ = gix_diff::tree(parent_tree_iter, tree_iter, state, &odb, &mut recorder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is where one would have to ignore the Cancelled variant of the error, but fail on all others.
The reason this is an error is… that it's intended to be used for cancellation, which is when one wants this to error out easily.
Here this isn't too useful though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, I’ve adapted this part.

// TODO
// The name is preliminary and can potentially include more context. Also, this should probably be
// moved to its own location.
struct Recorder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably call it after what it does, like CancelDiffOncePathTouched (or better :)).
The implementation I'd also put inside the function that uses it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve renamed and inlined Recorder.

gix-blame/src/file/function.rs Outdated Show resolved Hide resolved
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this will work!

gix-blame/src/file/function.rs Show resolved Hide resolved
gix-blame/src/file/function.rs Show resolved Hide resolved
@Byron Byron merged commit 59bd978 into GitoxideLabs:main Jan 7, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants