-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate whether we can avoid a 1k lookup table in init_tree without reducing performance #153
Comments
If anyone is interested in working on this, |
Yeah it might be worth looking into. Already tinkering a bit with some other tweaks to the huffman tree stuff in inflate. Should note it's now a 2K table instead of 4k as I found that using 512-size one and bit reverse on larger numbers seemed to not hurt performance. Also I haven't pushed it yet but also found that it could be avoided entirely on aarch64 and loongarch (and technically armv7 and newer 32-bit arm but didn't find an easy way to differentiate between different arm feature levels at compile time with just the standard lib) with no issue as those architectures have a bit reverse instruction and at least at least when testing on my aarch64 raspberry pi 3B it didn't seem to be any slower to use that. |
Still have to look into the libdeflate/fdeflate approach (maybe it's faster than the current one as well) though with 8e331bb it's now just 1k instead of 2k in size as it (and some variables) was using a larger data type than needed (this also seems to have made things a very tiny bit faster). Also disabled the lookup table for builds not using alloc as I'm assuming those setups probably need the extra 1k memory saving more than the tiny speed saving (and there may not be a speed saving anyhow if memory/cache is slow). |
As noted in this PR, this lookup table used to increase performance is very large - maybe there is a better way of doing this?
#152
The text was updated successfully, but these errors were encountered: