add multiprocessing and author canonicalization#104
Open
thehesiod wants to merge 15 commits intocasperdcl:mainfrom
Open
add multiprocessing and author canonicalization#104thehesiod wants to merge 15 commits intocasperdcl:mainfrom
thehesiod wants to merge 15 commits intocasperdcl:mainfrom
Conversation
thehesiod
commented
Nov 22, 2024
| if isinstance(args.gitdir, str): | ||
| args.gitdir = [args.gitdir] | ||
| # strip `/`, `.git` | ||
| gitdirs = [i.rstrip(os.sep) for i in args.gitdir] |
thehesiod
commented
Nov 22, 2024
| for auth, stats in getattr(old, 'iteritems', old.items)(): | ||
| i = auth_stats.setdefault(auth2em[auth], | ||
| {"loc": 0, "files": set(), "commits": 0, "ctimes": []}) | ||
| auth_email = list(auth2em[auth])[0] # TODO: count most used email? |
Author
There was a problem hiding this comment.
I think they should all be returned
thehesiod
commented
Nov 22, 2024
Comment on lines
+258
to
+266
| # if since: | ||
| # # Strip boundary messages, | ||
| # # preventing user with nearest commit to boundary owning the LOC | ||
| # blame_out = RE_BLAME_BOUNDS.sub('', blame_out) | ||
| # | ||
| # if until: | ||
| # # Strip boundary messages, | ||
| # # preventing user with nearest commit to boundary owning the LOC | ||
| # blame_out = RE_BLAME_BOUNDS.sub('', blame_out) |
casperdcl
reviewed
Nov 29, 2024
Owner
casperdcl
left a comment
There was a problem hiding this comment.
What about using threads (e.g. concurrent.futures) instead of processes?
Author
actually ya since the work is being done already by a separate process. I'll work on swapping it over |
casperdcl
reviewed
May 26, 2025
Comment on lines
+58
to
+61
| --author-mapping-file-path=<path> Path to file containing dictionary mapping author name | ||
| to normalized author name | ||
| --author-email-mapping-file-path=<path> Path to file containing dictionary mapping author | ||
| email address to normalized author name |
Owner
There was a problem hiding this comment.
btw I don't think this is necessary since git's .mailmap is already supported.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
time to use those idle CPUs :)
given the majority of the time is spent on the git-blame executable it's nearly linear growth for quite some time. Running on 16 core machine with fast ssd yields pretty tremendous speed increase. I tried being as frugal as possible with communication between the processes.