Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip uninteresting commits for blame #1743

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

cruessler
Copy link
Contributor

@cruessler cruessler commented Dec 27, 2024

This PR is a PoC. It is a first attempt at implementing custom graph traversal for blame::file. This allows us to skip commits that we know don’t contain changes to the file being blamed. The PR is rough around the edges and contains a lot of TODOs and todo!()s.

A first round of tests showed a significant reduction in the number of commits traversed for gix blame STABILITY.md:

# before
Statistics {
    commits_traversed: 9261,
    commits_to_tree: 10710,
    trees_decoded: 10710,
    trees_diffed: 37,
    blobs_diffed: 36,
}

# after
Statistics {
    commits_traversed: 3657,
    commits_to_tree: 3749,
    trees_decoded: 3749,
    trees_diffed: 37,
    blobs_diffed: 36,
}

Also, profiling gix blame STABILITY.md on my machine revealed that the command took about 20 % less time to run.

@cruessler cruessler force-pushed the skip-uninteresting-commits-for-blame branch from 50e9f97 to 20a90a2 Compare January 17, 2025 16:19
Pass blame to parent in `process_change`. `git`’s algorithm only seems
to keep the current suspect for unblamed hunks that were the direct
result of splitting an existing unblamed hunk because it matched with a
change. All other hunks appear to be blamed on the parent without
further checks.
@cruessler cruessler force-pushed the skip-uninteresting-commits-for-blame branch from 20a90a2 to efa0057 Compare January 21, 2025 08:25
@cruessler
Copy link
Contributor Author

I think (and hope :-)) that this PR is now ready for the first round of reviews. There’s still two or three todo!() left, but I’ve addressed all of the TODOs as far as I can see and I’m really satisfied with the numbers so far as there seems to be a considerable speedup.

Also, the traversal is now much closer to what git does, and I hope that it most of the time is even identical unless there are differences in diffing results that can make traversals diverge. I haven’t found ways to validate this claim at a large scale, though, so I want to be cautious at this point.

gitoxide on  main [!?⇣] is 📦 v0.40.0 via 🦀 v1.84.0 took 5s
❯ hyperfine "$HOME/github/Byron/gitoxide/target/release/gix blame STABILITY.md" "$HOME/worktrees/gitoxide/branch-1/target/release/gix blame STABILITY.md"
Benchmark 1: /home/christoph/github/Byron/gitoxide/target/release/gix blame STABILITY.md
  Time (mean ± σ):     194.6 ms ±   2.3 ms    [User: 174.0 ms, System: 19.8 ms]
  Range (min … max):   190.8 ms … 198.5 ms    15 runs

Benchmark 2: /home/christoph/worktrees/gitoxide/branch-1/target/release/gix blame STABILITY.md
  Time (mean ± σ):      99.2 ms ±   1.5 ms    [User: 87.3 ms, System: 11.3 ms]
  Range (min … max):    97.1 ms … 104.1 ms    29 runs

Summary
  '/home/christoph/worktrees/gitoxide/branch-1/target/release/gix blame STABILITY.md' ran
    1.96 ± 0.04 times faster than '/home/christoph/github/Byron/gitoxide/target/release/gix blame STABILITY.md'

@cruessler cruessler marked this pull request as ready for review January 21, 2025 08:41
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a massive speedup, fantastic work! And sorry for the late (first) review, I am now totally behind on emails each, catching up only during the weekend.

When looking at the commits I saw that they are a bit all over the place, often touching multiple crates at the same time. This is OK when not using <prefix>: <subject> notation, but it's not otherwise.

Can we practice this by adjusting the <prefix>: <subject> commits so that they only touch a single crate? This can be followed by an adapt to changes in <crate> commit to make fixes that make it compile (yes, unfortunately not every commit would pass CI, but it's for getting proper changelogs).

In that moment, please feel free to squash other commits that might not be so valuable all by themselves.

I am also more than happy to do this as part of my review, just let me know.

@cruessler
Copy link
Contributor Author

Sure, no problem, thanks a lot for the detailed explanation! (I wasn’t aware that this means accepting intermediate commits that don’t pass CI.)

I will rearrange the commits, but I will only get to it on Monday or Tuesday, so feel free to take over if you get to it earlier!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants