Skip to content

brentonashworth/clj-diff-performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clj-diff-performance

Compare the performance of various diff algorithms. The unit tests also compare the results of each algorithm with random input to ensure consistent results are being generated.

Currently, three algorithms are being compared: Miller, Myers and Fraser. Miller and Myers are written in Clojure and Fraser is written in Java. When operating on strings, all algorithms take advantage of the pre-diff optimizations mentioned in Neil Fraser’s Diff Strategies. These optimizations do not help when operating on Clojure sequences. Most of the tests below are performed on strings so that results may be compared with the Fraser algorithm (which will only work on strings). The performance for sequences is generally the same as the Miller algorithm for strings; where it differs, additional charts are shown.

Usage

user=> (use 'clj-diff.performance)
user=> (performance-tests)

Results

For each data point on the charts below, multiple tests are run with the same data and the mean of the fastest 2/3rds are plotted. If you run the tests yourself, you will also see a table which includes the standard deviation. Unless otherwise stated, changes will always replace existing values so that the new sequence is the same size as the old one.

For 100 and 1000 character strings, vary the number of mutations made to the string from 1 change up to about 90% change.

Vary the length of the strings, with 5%, 10% and 50% change.

The Miller algorithm is especially fast when there is a large difference in the lengths of the sequences. This test will delete half of the sequence from the middle and then mutate 10% of the remaining sequence.

In each chart above, sequence performance is the same as with strings. When strings are being diffed, there are several optimizations that may be performed. These optimizations make use of Java’s very fast String functions. The same optimizations do not improve performance when used on sequences, in fact they severely degrade it. Below are two examples where the string optimizations greatly improve performance. For each example one test is run with only strings and another is run comparing the performance of strings and sequences.

Move the first element of the sequence to the end.

Make a small change in the middle of the sequence.

License

Copyright © 2010 Brenton Ashworth

Distributed under the Eclipse Public License, the same as Clojure uses. See the file COPYING.

About

Measuring the performance of various diff algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published