-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Compression for Level 6+ #6
Comments
I'll try and take a look this weekend and see if there is anything obvious. |
Thanks mate! Really appreciate it! |
That Squash benchmark is from this commit authored in mid-2015 so it is quite outdated. The first thing I would do in making the comparison is checking to make sure that the deflate configuration table being compared is 1-to-1. Config TablesZlibStreamZlibStream/src/ZlibStream/Deflate.cs Lines 80 to 98 in 196c473
Zlib-nghttps://github.com/zlib-ng/zlib-ng/blob/51566828e52803a1355837660ac2afd585ac05e4/deflate.c#L142-L168 Zlib-ng ResultsI've run some results using the commit nmoinvaz/zlib-ng@a6797d8 of zlib-ng I am on. Using
With some of the new strategies zlib-ng trade compression ratio for speed. Fast-zlibFor levels 8 & 9, I have a PR zlib-ng/zlib-ng#828 that uses the fast-zlib method. This is an optimization that you might want to consider implementing at some point that speeds things up a lot. It is tricky to get right and I had to step through fast-zlib and zlib-ng one symbol at a time in order to get it to produce the same results. But if you take a look at the very bottom that PR you can see speed/compression size ratio benchmarks for zlib-ng/madler and that might give you an idea of how zlib-ng has traded speed for compression. Hash tableZlibStream uses the same CloudflareCloudflare zlib does something a bit differently with hashing. It sets the ZlibStream/src/ZlibStream/Deflate.cs Line 66 in 196c473
I would be interested to see if you could dynamically change from Hopefully that is some helpful info. |
Thanks so much for all this information @nmoinvaz it's all very useful! |
No problem. This resource might also be a good one. It has helped me every so often. |
You may see some performance improvement if you increase the size of the output buffer to 64kb which I have demonstrated here. |
Oooh I'll give that a shot! |
You might want to make it configurable as mentioned in that PR discussion, that way if somebody is using it for web server connections they can set it to use low memory per connection. |
Also, I have had some internal zlib-ng discussions about bypassing the pending buffer altogether in certain instances. But for Edit: |
Yeah I'm trying to expose all the options to fill the gaps in the MS implementation. |
Not sure if this will help but on my zlib I nuked a ton of unused internal methods in the code, perhaps that might help with performance some (if not it sure will help with reading the code however). I still wait for performance and memory usage reduction pr's for the internal bits of the library however whenever that could get done. Oh and btw I sorta nuked Also if pr contains usage of System.Memory to use Span and stuff I do not mind (be sure to up (Note: you might want to compare the common history between my repository and this one and carefully apply the changes from mine so that you guys do not lose all your work so far) Also on the 3 ~ 4 bit on the thing, why not add a new compression level to control that setting? Maybe something like But here is the issue, it is const so it cannot be changed so then the And if you guys forgot my zlib is still https://github.com/Elskom/zlib.managed/ At least I tried to improve some things on it as well that is. |
Also why is it not possible to have the same compression rations but also make them fast like said above? Like why must for use to win with speed of compressing, we must also loose for sake of size of the stuff at the higher levels then? Why cant we just get both speed, but also keep the compression size as is without losing any good and considerable size reductions. But that is just me I always been for size of the data but also hated waiting just to get the maximum possible compression that could be made. Btw someone mind experimenting and see if this would do anything as well too? I thought of it in my head: new Config(32, 516, 516, 8192, SLOW), // 10 As another test only compression level, I would like to know what the results would be if possible or if it generates valid compressed data that can be decompressed. I could have miscalculated the last number however I am not sure. |
Welp after doing tests on this for trying to improve the compression levels and then adding tests, it seems for some reason when I look at the compressed outputs on a file and compare them that the inputs get directly copied to the compressed output and is not actually compressed after it inserts the zlib header at the beginning of the compressed output and the zlib trailer at the end of it. And ironically it seems to be the same for every compression level and the tests seem to not fail. As such I might need someone to use the last stable version from nuget to see if the compressed output is the same as well on that same file when BestCompression is used. |
Oh and @JimBobSquarePants I recently found an issue with how compression is made in zlib.net and zlib.managed and it may also apply here too. For some reason in my code currently when I tried to add a unit test that it seems no mater what compression level I test on the test file the resulting output data after compression is exactly the same data as the input file (with the zlib header and footer appended to it). It could be a simple issue on my part but I tracked it down all the way to zlib.net as well, the native C implementation seems to not have this issue on my end. |
@AraHaan Sorry I've not gotten back to you. Been heavily occupied with my other libraries so this is on the backburner. My implementation definitely compresses the output, unfortunately there's something wrong with the inflate stream now. I broke it at some point and have not been able to determine at what time this took place. As such I will have to start the inflate stream optimizations again from scratch. |
I understand, perhaps a best course of action would be to rebase all of our zlib work based on the latest version of the C implementation. Zlib.NET was based on an outdated C implementation so that might be an option worth looking in to. |
A lot of my codebase is based upon zlib-ng. There’s significant changes from both yours and zlib.net. It’ll be a simple case of repeating my steps just with better testing. Time is the biggest factor. |
Yep |
Compression result for the Canterbury Corpus are disappointing. We always appear to be a few percentage points off the other libraries once we hit level 6, except for
kennedy.xls
which compresses far better than the alternatives. For lower compression levels we compare very well.https://github.com/SixLabors/ZlibStream/blob/196c4730ba637a445e840ed7cfe67297e77b47af/benchmarks.md
According to the Squash Benchmark
cp.html
at level 6, we only achieve 2.99.kennedy.xls
at level 6, we achieve 5.5.I had a go at porting the
deflate_slow
method from there but that dramatically reduced compression in our sparse benchmarks to levels matching compression level 3. I haven't ported acrossdeflate_medium
yet to experiment (deflate_quick
is currently broken and disabled via compiler conditionals).@nmoinvaz I'd love to have your insight as to the cause of the difference if you have time.
The text was updated successfully, but these errors were encountered: