rdrand: Avoid inlining unrolled retry loops.

briansmith · briansmith · commit 857e87325f2b · 2024-05-31T13:31:10.000-07:00
The rdrand implementation contains three calls to rdrand():

1. One in the main loop, for full words of output.
2. One after the main loop, for the potential partial word of output.
3. One inside the self-test loop.

In the first case, the loop is unrolled into:

```
loop:
   ....

   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
   rdrand &lt;register&gt;
   jb loop
```

The second case is similar, except it isn't a loop.

In the third case, the self-test loop, the same unrolling happens, but then
the self-test loop is also unrolled, so the result is a sequence of 160
instructions.

With this change, the generated code for the loop looks like this:

```
loop:
        ...

	rdrand &lt;register&gt;
	jb loop
	call retry
	test rax, rax
	jne loop
	jmp fail
```

The generated code for the tail now looks like this:

```
        rdrand rdx
	jae call_retry
        ...

```

This is much better because we're no longer jumping over the uselessly-
unrolled loops.

The loop in `retry()` still gets unrolled though, but the compiler will
put it in the cold function section.

Since rdrand will basically never fail, the `jb &lt;success&gt;` in each
call is going to be predicted as succeeding, so the number of
instructions doesn't change. But, instruction cache pressure should
be reduced.
diff --git a/src/rdrand.rs b/src/rdrand.rs
@@ -14,20 +14,30 @@ cfg_if! {
     }
 }
 
-// Recommendation from "Intel® Digital Random Number Generator (DRNG) Software
-// Implementation Guide" - Section 5.2.1 and "Intel® 64 and IA-32 Architectures
-// Software Developer’s Manual" - Volume 1 - Section 7.3.17.1.
-const RETRY_LIMIT: usize = 10;
-
 #[target_feature(enable = "rdrand")]
 unsafe fn rdrand() -> Option<Word> {
-    for _ in 0..RETRY_LIMIT {
-        let mut val = 0;
-        if rdrand_step(&mut val) == 1 {
-            return Some(val);
+    #[cold]
+    unsafe fn retry() -> Option<Word> {
+        // Recommendation from "Intel® Digital Random Number Generator (DRNG) Software
+        // Implementation Guide" - Section 5.2.1 and "Intel® 64 and IA-32 Architectures
+        // Software Developer’s Manual" - Volume 1 - Section 7.3.17.1.
+
+        // Start at 1 because the caller already tried once.
+        for _ in 1..10 {
+            let mut val = 0;
+            if rdrand_step(&mut val) == 1 {
+                return Some(val);
+            }
         }
+        None
+    }
+
+    let mut val = 0;
+    if rdrand_step(&mut val) == 1 {
+        Some(val)
+    } else {
+        retry()
     }
-    None
 }
 
 // "rdrand" target feature requires "+rdrand" flag, see https://github.com/rust-lang/rust/issues/49653.