Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start blame from cache #1852

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

holodorum
Copy link

As discussed in #1848, @jtwaleson and I have been working on a way to speed up git blame by introducing a caching mechanism. This allows us to start a blame operation from a checkpoint instead of computing it from scratch, significantly reducing computation time.

Proposed Changes

  1. Introduce BlameCacheObject
    The function function::file now accepts a BlameCacheObject, which stores:
    Commit ID at which the blame was previously computed.
    Blame entries corresponding to that commit.
  2. Detect and Process Changes
    Using the cached data, we compute the differences between the cached blob and the new target blob at the suspect commit.
    If the file has been rewritten, this will probably error, so the BlameCacheObject might need to store the file path as well.
  3. Efficiently Update Blame Entries
    Cached blame entries are updated based on detected changes.
    Only UnblamedHunks (caused by AddedOrReplace changes) are recomputed using the standard blame algorithm.
    Previously, the entire file or a range was marked as UnblamedHunk, but now this only happens when necessary.

So far the results show significant speed-ups. These are results for the README file in the linux repo starting with a blame at commit bf4401f3ec700e1a7376a4cbf05ef40c7ffce064.

Performing blame operations
Elapsed time for blame on bf4401f3ec700e1a7376a4cbf05ef40c7ffce064: 6604ms
Statistics: Statistics { commits_traversed: 18008, trees_decoded: 18030, trees_diffed: 6, blobs_diffed: 5 }

Performing blame with cache
Elapsed time for blame on 8c93d454027ffceea663ce6ea5b87557b8aaeb8a: 4ms
Statistics: Statistics { commits_traversed: 0, trees_decoded: 2, trees_diffed: 0, blobs_diffed: 1 }

Performing blame without cache
Elapsed time for blame on 8c93d454027ffceea663ce6ea5b87557b8aaeb8a: 313ms
Statistics: Statistics { commits_traversed: 20592, trees_decoded: 20620, trees_diffed: 8, blobs_diffed: 7 } 

time git blame README >/dev/null
Blaming lines: 100% (14/14), done.
git blame README > /dev/null  0.26s user 0.21s system 33% cpu 1.382 total

Next Step's

  • Add tests that compare the outcome of a blame with and without cache
  • Add filepath to BlameCacheObject

Curious to hear what you think!

Add an algorithm that takes an existing blame and diff changes and computes the new blame and unblamed hunks.
@Byron
Copy link
Member

Byron commented Feb 22, 2025

Thanks a lot for contributing! It's great to see what can be done with a cache and I'd love to see this go further. What if there was enough tooling around it so that gitui could build such a cache in memory to allow walking through/digging into blames of the same file on the fly. For instance, if that one commit changed all lines it changed tabs to spaces it should be possible to quickly 'go through' that veil without recomputing everything up to that point.

This PR should of course remain minimal, it would just be interesting to make these changes with user-value in mind.

@cruessler has been working towards a first integration into gitui as well, hence the note above.

Speaking of, all I did was read the PR text and make CI pass, and I wonder if @cruessler would like to take a first closer look?

Thanks everyone 🙏

@cruessler
Copy link
Contributor

I’ll be happy to have a look! (Just FYI, I’ll be travelling for a week starting next Tuesday, so it might take some time until I finally get to it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants