You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the blog post and repository, they were some interesting food for thought and i appreciated the unique contributions (circular buffer and restricting the k range)
I strongly suspect that the reason these optimizations are not very effective is simply because they only trigger under 'worst case' conditions. e.g. -(D - 2*max(0, D-M) only evaluates to something other than -D when D>M but the fundamental hypothsis of software like diff is that D is very small. When D is close to M (or N), this means that the meyers algorithm is quadratic. Software like git diff simply avoids ever exploring that muche of the edit space and switches to a different heuristic, or splits on a suboptimal but 'good enough diagonal chain (which they confusingly call a snake). diffutils apparently has a similar trick.
The modular arithmetic will add a non trivial overhead and possibly interfere with prefetching logic leading to more cache misses. Furthermore this would only tend to trigger when D > delta, such a condition isn't suprising, but allocating a large array that you don't always access (e.g. when D is small) is also very efficient on modern hardware and operating systems due to virtual memory.
I would hazard a guess that this is the reason the MaxRSS doesn't behave as you expected in your implementation, you end up pulling in more pages due to the wraparound.
Thanks again for the article and repo!
The text was updated successfully, but these errors were encountered:
Thanks for the blog post and repository, they were some interesting food for thought and i appreciated the unique contributions (circular buffer and restricting the
k
range)I strongly suspect that the reason these optimizations are not very effective is simply because they only trigger under 'worst case' conditions. e.g.
-(D - 2*max(0, D-M)
only evaluates to something other than-D
whenD>M
but the fundamental hypothsis of software likediff
is thatD
is very small. WhenD
is close toM
(orN
), this means that the meyers algorithm is quadratic. Software likegit diff
simply avoids ever exploring that muche of the edit space and switches to a different heuristic, or splits on a suboptimal but 'good enough diagonal chain (which they confusingly call a snake). diffutils apparently has a similar trick.The modular arithmetic will add a non trivial overhead and possibly interfere with prefetching logic leading to more cache misses. Furthermore this would only tend to trigger when
D
> delta, such a condition isn't suprising, but allocating a large array that you don't always access (e.g. whenD
is small) is also very efficient on modern hardware and operating systems due to virtual memory.I would hazard a guess that this is the reason the MaxRSS doesn't behave as you expected in your implementation, you end up pulling in more pages due to the wraparound.
Thanks again for the article and repo!
The text was updated successfully, but these errors were encountered: