Skip to content

Conversation

@Daniel-Aaron-Bloom
Copy link

@Daniel-Aaron-Bloom Daniel-Aaron-Bloom commented Apr 29, 2025

Per this zulip discussion and rust-lang/rust#140341 and in contrast to the discussion on #107, read_volatile (and all other equivalent functions in the standard library) should provide strict fewer optimization barriers than black_box.

In service of testing this, I've added a script which makes sure the assembly instructions of a simple case are approximately the expected length across most common architectures.

Also read_volatile is never going to be const, and it would be nice to constify this library (which will hopefully be my next PR if this one is accepted).

@elichai
Copy link

elichai commented Aug 24, 2025

Semi Related: #135 (Also touches the cfg)

@cmlsharp
Copy link

cmlsharp commented Nov 18, 2025

Const-ness and better "guarantees" (heuristic guarantees anyway) are the main reasons to prefer core::hint::black_box but it's worth mentioning there are minorly positive performance benefits to using core::hint::black_box as well. (though I sort of doubt this is an actual performance bottleneck for anyone).

I've been writing my own version of subtle on the side, and from my benchmarks, you get about a 1.3-1.5x speedup switching from the read_volatile based black box to the hint::black_box one simply because the read_volatile version requires calling an #[inline(never)] function which results in a bunch of unnecessary stack manipulations. That said, this takes ~50k comparisons to rise to the level of milliseconds on my laptop

The difference in ASM is something like the following:

read_volatile_select_if_eq:
        push rbp
        push rbx
        push rax
        mov ebx, esi
        mov ebp, edi
        xor edi, edi
        cmp edx, esi
        sete dil
        call subtle::black_box
        movzx eax, al
        neg eax
        xor ebp, ebx
        and ebp, eax
        xor ebp, ebx
        mov eax, ebp
        add rsp, 8
        pop rbx
        pop rbp
        ret
        
 subtle::black_box:
        mov byte ptr [rsp - 1], dil
        movzx eax, byte ptr [rsp - 1]
        ret

vs

hint_bb_select_if_eq:
        cmp esi, edx
        sete byte ptr [rsp - 1]
        lea rax, [rsp - 1]
        movzx eax, byte ptr [rsp - 1]
        neg eax
        xor esi, edi
        and eax, esi
        xor eax, edi
        ret

edit: of course the core::hint::black_box version still has some duplicate loads/stores but that's just kinda the price you pay for the optimization barrier approach I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants