-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle CPU_RNG failures in OCaml: #255
base: main
Are you sure you want to change the base?
Conversation
- Previously, a return value of "0" was treated specially, but rdrand/rdseed may return 0! - If there's a failure in rdrand/rdseed during cpu_rng_bootstrap, raise an exception - If there's a failure during the periodic reseeding (timer initiated feeding of all pools), ignore it (for now!?)
Perhaps And to reduce the likelihood of failure see #253, the retry count for RDSEED can be increased, and if RDSEED fails then we could fall back to calling RDRAND a number of times if we follow #252 (rdrand has some built in fairness between cores, whereas rdseed just outright fails often).
I don't think we should do that, at least not for RDSEED failures, which is documented to fail, and fails within a couple of seconds very easily. A failure here doesn't mean the CPU is broken, it just means the CPU is busy... |
2f1743d
to
21945e6
Compare
Thanks for your insights. So, for Now, for periodic feeding of the rng, we use the My questions are now:
|
Startup of a mirage unikernel perhaps, but not startup of the host. There could be any number of other VMs or processes running on the host CPU already, and I think depleting entropy with RDSEED depletes it globally for the whole CPU, not just your core, but I'll have to do some tests to confirm that (i.e. if I busy loop calling rdseed or rdrand on 31 cores, will my 32th core get a chance of a successful rdseed ever?)
I think that is already taken care by Fortuna. If you give Fortuna all the rdrand values you have it'll figure out what to do with them (IIRC Fortuna doesn't require you to estimate how much entropy you've got in each sample).
I'd say probably, but I'll need to reread the manual and do some experiments (also I'll need to see whether the AMD SDM has any documentation on this that has other suggestions). When I looked at this yesterday 100 wasn't enough for RDSEED. Reading between the lines it seems like it'd be possible for some cores to DoS the ability to run RDSEED on other cores ~forever, but I'll need to confirm whether that is true, and whether in that case falling back to RDRAND would help or not. But regardless of what the manual says we can't trust hardware (and future generations of hardware may behave differently or have different bugs anyway), so we must have some way to handle failure. I'd say that a failure during reseeding is not critical: if it was safe enough for the RNG to work and generate numbers before, then why wouldn't it be safe to continue as is. And if the answer is that it isn't safe to continue without reseeding then the RNG wasn't seeded well enough in the first place and shouldn't have given out any random numbers until properly seeded. I'd suggest to use an Atomic.incr to count the number of failures, and someone interested could monitor that counter, set up some alerting from the application, or crash the application if that is what they wish, but the library should leave that choice up to the application. |
Thanks again, edwin. With 276ffe2 the rdrand and rdseed calls and failures are exposed now. Remaining: rdrand when rdseed is unavailable during bootstrap; also resort to rdrand if rdseed is not returning good values? And how many RETRIES for rdseed. |
Thank you very much for your time, comments and testing! |
if rdseed fails, take 512 times rdrand. if rdrand fails 512 times (* 10 from the RETRIES in C), fail
0ded58a
to
a55afa4
Compare
With a55afa4 if only rdrand is available or rdseed fails during bootstrap, we call it 512 times. An error is logged on failure. The code, as before (in bootstrap) - resorts to using whirlwind. |
I adjusted the retries -- rdseed now 100, rdrand still at 10. From my side, this is ready for review & merge. It may be sensible to cut a release thereafter (with #254 merged as well). Any opinion @edwintorok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok, the errors on the CI seems unrelated.
//cc @edwintorok addresses #251 -- though I'm not yet happy with the
cpu_rng
failure case //feed_pools
error case. I'm uncertain what to do: