Skip to content

Inflate: copy_rep_matches can be faster #41

@Shnatsel

Description

@Shnatsel

This is a very hot function responsible for 30% of the time spent in inflate on an exr crate benchmark:

for i in 0..length
{
// Safety: We assert the worst possible length is in place
// before the loop for both src and dest.
//
// Reason: This is perf sensitive, we can't afford such slowdowns
// and it activates optimizations i.e see
let byte = dest[offset + i];
dest[dest_offset + i] = byte;
}

There are at least two ways to speed it up:

  1. https://crates.io/crates/rle-decode-fast implements an algorithm copying data in batches exponentially increasing in size rather than byte-by-byte. This should be faster than the current approach, although may need some tweaking (e.g. capping the exponent at some size - needs benchmarks). Disclaimer - I took part in developing that, I may be biased.
  2. Nothing is done to eliminate bounds checks. I don't think using an iterator will work here, but there's ample room for optimizer hints. I've successfully applied those in the inflate crate for this very algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions