This repository has been archived by the owner on May 16, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy path0001-revision-quit-pruning-diff-more-quickly-when-possibl.patch
128 lines (113 loc) · 4.65 KB
/
0001-revision-quit-pruning-diff-more-quickly-when-possibl.patch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
From a937b37e766479c8e780b17cce9c4b252fd97e40 Mon Sep 17 00:00:00 2001
From: Jeff King <[email protected]>
Date: Fri, 13 Oct 2017 11:27:45 -0400
Subject: [PATCH] revision: quit pruning diff more quickly when possible
When the revision traversal machinery is given a pathspec,
we must compute the parent-diff for each commit to determine
which ones are TREESAME. We set the QUICK diff flag to avoid
looking at more entries than we need; we really just care
whether there are any changes at all.
But there is one case where we want to know a bit more: if
--remove-empty is set, we care about finding cases where the
change consists only of added entries (in which case we may
prune the parent in try_to_simplify_commit()). To cover that
case, our file_add_remove() callback does not quit the diff
upon seeing an added entry; it keeps looking for other types
of entries.
But this means when --remove-empty is not set (and it is not
by default), we compute more of the diff than is necessary.
You can see this in a pathological case where a commit adds
a very large number of entries, and we limit based on a
broad pathspec. E.g.:
perl -e '
chomp(my $blob = `git hash-object -w --stdin </dev/null`);
for my $a (1..1000) {
for my $b (1..1000) {
print "100644 $blob\t$a/$b\n";
}
}
' | git update-index --index-info
git commit -qm add
git rev-list HEAD -- .
This case takes about 100ms now, but after this patch only
needs 6ms. That's not a huge improvement, but it's easy to
get and it protects us against even more pathological cases
(e.g., going from 1 million to 10 million files would take
ten times as long with the current code, but not increase at
all after this patch).
This is reported to minorly speed-up pathspec limiting in
real world repositories (like the 100-million-file Windows
repository), but probably won't make a noticeable difference
outside of pathological setups.
This patch actually covers the case without --remove-empty,
and the case where we see only deletions. See the in-code
comment for details.
Note that we have to add a new member to the diff_options
struct so that our callback can see the value of
revs->remove_empty_trees. This callback parameter could be
passed to the "add_remove" and "change" callbacks, but
there's not much point. They already receive the
diff_options struct, and doing it this way avoids having to
update the function signature of the other callbacks
(arguably the format_callback and output_prefix functions
could benefit from the same simplification).
Signed-off-by: Jeff King <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
---
diff.h | 1 +
revision.c | 16 +++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/diff.h b/diff.h
index e9ccb38c26..fe5c287a70 100644
--- a/diff.h
+++ b/diff.h
@@ -180,6 +180,7 @@ struct diff_options {
pathchange_fn_t pathchange;
change_fn_t change;
add_remove_fn_t add_remove;
+ void *change_fn_data;
diff_format_fn_t format_callback;
void *format_callback_data;
diff_prefix_fn_t output_prefix;
diff --git a/revision.c b/revision.c
index 771d079f6e..7c23ab7afe 100644
--- a/revision.c
+++ b/revision.c
@@ -394,8 +394,16 @@ static struct commit *one_relevant_parent(const struct rev_info *revs,
* if the whole diff is removal of old data, and otherwise
* REV_TREE_DIFFERENT (of course if the trees are the same we
* want REV_TREE_SAME).
- * That means that once we get to REV_TREE_DIFFERENT, we do not
- * have to look any further.
+ *
+ * The only time we care about the distinction is when
+ * remove_empty_trees is in effect, in which case we care only about
+ * whether the whole change is REV_TREE_NEW, or if there's another type
+ * of change. Which means we can stop the diff early in either of these
+ * cases:
+ *
+ * 1. We're not using remove_empty_trees at all.
+ *
+ * 2. We saw anything except REV_TREE_NEW.
*/
static int tree_difference = REV_TREE_SAME;
@@ -406,9 +414,10 @@ static void file_add_remove(struct diff_options *options,
const char *fullpath, unsigned dirty_submodule)
{
int diff = addremove == '+' ? REV_TREE_NEW : REV_TREE_OLD;
+ struct rev_info *revs = options->change_fn_data;
tree_difference |= diff;
- if (tree_difference == REV_TREE_DIFFERENT)
+ if (!revs->remove_empty_trees || tree_difference != REV_TREE_NEW)
DIFF_OPT_SET(options, HAS_CHANGES);
}
@@ -1346,6 +1355,7 @@ void init_revisions(struct rev_info *revs, const char *prefix)
DIFF_OPT_SET(&revs->pruning, QUICK);
revs->pruning.add_remove = file_add_remove;
revs->pruning.change = file_change;
+ revs->pruning.change_fn_data = revs;
revs->sort_order = REV_SORT_IN_GRAPH_ORDER;
revs->dense = 1;
revs->prefix = prefix;
--
2.15.0