You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The rdrand implementation contains three calls to rdrand():
1. One in the main loop, for full words of output.
2. One after the main loop, for the potential partial word of output.
3. One inside the self-test loop.
In the first case, the loop is unrolled into:
```
loop:
....
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
rdrand <register>
jb loop
```
The second case is similar, except it isn't a loop.
In the third case, the self-test loop, the same unrolling happens, but then
the self-test loop is also unrolled, so the result is a sequence of 160
instructions.
With this change, the generated code for the loop looks like this:
```
loop:
...
rdrand <register>
jb loop
call retry
test rax, rax
jne loop
jmp fail
```
The generated code for the tail now looks like this:
```
rdrand rdx
jae call_retry
...
```
This is much better because we're no longer jumping over the uselessly-
unrolled loops.
The loop in `retry()` still gets unrolled though, but the compiler will
put it in the cold function section.
Since rdrand will basically never fail, the `jb <success>` in each
call is going to be predicted as succeeding, so the number of
instructions doesn't change. But, instruction cache pressure should
be reduced.
0 commit comments