Fix: Add memoization to containsRecursiveSingularField to prevent exponential traversal#2014
Fix: Add memoization to containsRecursiveSingularField to prevent exponential traversal#2014iamyashh wants to merge 1 commit intoapple:mainfrom
Conversation
…onential traversal
thomasvl
left a comment
There was a problem hiding this comment.
Sounds like you all have some really big proto files!
Thanks for the patch. I'm guessing in a previous version you added the Set you mentioned? The patch seems to reused the existing cache.
I'm not sure this completely gets us down to O(N) because one of the limits has always been it only writes into the cache for the recursive subparts while traversing; the top most calls are the only ones that store into the cache for non recursive cases, connected graph that doesn't recuse might still have some potential performance issues.
But it makes sense to land this improvement vs. hold out for a complete revisit to the algorithm.
Running this through CI now.
thomasvl
left a comment
There was a problem hiding this comment.
Right after kicking off the CI, I thought of a change here that isn't intended, and the CI failure shows that this is changing the decision about when to use storage in a way that isn't what we want.
Take this proto file example:
edition = "2023";
package tvl.testing;
message MessageA {
MessageB b = 1;
MessageC c = 2;
}
message MessageB {
MessageA a = 1;
}
message MessageC {
MessageB b = 1;
}All three messages are recursive via different paths. But they the patch on this PR, when traversing A, when it recurses the first field b, it will see the full recursion and store that into the cache and exit (it look at the second field. When B gets generated, it is already in the cache from A, so no work. Now when C generates, as it loops over the fields, it will find B, and not do the recursion to on to A and then on to the second field of A to find that recursion, so C wronging comes out as not recursive.
Problem
When generating Swift code for a
.protofile containing a large number ofmessages with complex cross-references,
protoc-gen-swifthangs indefinitelyat near-100% CPU and never produces output.
The root cause is in
containsRecursiveSingularField— a recursive functionthat checks whether a message type contains a singular field that eventually
references itself. On a graph of N mutually-referencing message types, this
function is called O(N²) or worse times with no caching, causing exponential
traversal of the message dependency graph.
Reproduction: Any
.protofile with 200+ messages that have cross-referencesbetween types will exhibit severe slowdown. Files with 800+ messages with dense
cross-references will hang indefinitely.
Fix
Add an
analysisCache: Set<String>parameter that tracks which message typeshave already been visited during the current traversal, preventing revisiting
the same node multiple times.
This reduces the complexity from exponential to O(N) for the traversal.
Impact