-
Notifications
You must be signed in to change notification settings - Fork 54
Closed
Description
This is a very hot function responsible for 30% of the time spent in inflate on an exr crate benchmark:
zune-image/zune-inflate/src/utils.rs
Lines 93 to 102 in 5097f4c
| for i in 0..length | |
| { | |
| // Safety: We assert the worst possible length is in place | |
| // before the loop for both src and dest. | |
| // | |
| // Reason: This is perf sensitive, we can't afford such slowdowns | |
| // and it activates optimizations i.e see | |
| let byte = dest[offset + i]; | |
| dest[dest_offset + i] = byte; | |
| } |
There are at least two ways to speed it up:
- https://crates.io/crates/rle-decode-fast implements an algorithm copying data in batches exponentially increasing in size rather than byte-by-byte. This should be faster than the current approach, although may need some tweaking (e.g. capping the exponent at some size - needs benchmarks). Disclaimer - I took part in developing that, I may be biased.
- Nothing is done to eliminate bounds checks. I don't think using an iterator will work here, but there's ample room for optimizer hints. I've successfully applied those in the
inflatecrate for this very algorithm.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels