Skip to content

Would different placement of the fast-path for literals work better? #5

Open
@sh1boot

Description

@sh1boot

I've been working on understanding the fast-path literal optimisation which recently trickled through to the cloudflare fork:

if (likely(here.op == 0)) {
*out++ = (unsigned char)(here.val);
TABLE_LOAD(lcode, hold & lmask);
old = hold;
hold >>= here.bits;
bits -= here32;
if (likely(here.op == 0)) {
*out++ = (unsigned char)(here.val);
TABLE_LOAD(lcode, hold & lmask);
old = hold;
hold >>= here.bits;
bits -= here32;
}

I'm curious if you tried putting that logic at the end of the handler for the first literal in the main loop (with necessary adjustments):

if (likely(op == 0)) { /* literal */
Tracevv((stderr, here.val >= 0x20 && here.val < 0x7f ?
"inflate: literal '%c'\n" :
"inflate: literal 0x%02x\n", here.val));
*out++ = (unsigned char)(here.val);
}

Because it should mean you can eliminate these refills, right?

if (unlikely((bits & 63) < 15 + 13)) {
REFILL();
}

if (unlikely((bits & 63) < 10)) {
REFILL();
}

That is, at the start of the loop you have at least 56 bits available. If you bite off a couple of fast-path literals before getting into the conventional length/distance decode case then you might be down to something like 34 bits. That means you can no longer guarantee 15 + 5 + 15 + 13 bits avialible, which makes extra top-ups necessary.

I might be wrong about that because there are gotos in there to confuse me. But supposing I didn't mess up...

I figure if you try to scoop up extra literals after the first one, then that's up to 15 bits you've spent (because you might have used a trampoline table entry path for the first literal), leaving 41 bits guaranteed so you can keep sipping up literals at least 41/state->lenbits times. I think that's actually three more literals, isn't it?

I think this path is a branch prediction stressor, anyway, so it's a lot of cross-platform research which I'm not able to do to back up my "helpful suggestion".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions