context: introduce new global context API with rerandomization #806

apoelstra · 2025-06-21T00:44:25Z

As discussed in #388 and its parent issues, when std is enabled we have a fairly straightforward way to enable global contexts. We use thread-local variables and on every access we rerandomize them. When the rand crate is also available the situation is even better, because we don't need to think too hard about where to get entropy from.

In the nostd case things are harder. We have no thread locals and basically no synchronization primitives except atomics, which can be used to implement spinlocks but nothing else. Kix has argued strongly against spinlocks but in the following several messages we came to a solution in which do a "soft spinlock" where after a couple iterations we just give up and don't rerandomize.

Kix suggested adding some logging and debugging facilities, which I did not include in my solution here. We can add those in a followup.

Kix also suggested setting the maximum spin count to 0, on the theory that in most cases there will never be any contention except in cases of reentrancy, and in that case spinning is pointless. I think it should be higher than zero to help in situations where there really are multiple threads. I set it to 128 which shouldn't be a noticable (or even measurable) burden even in the case where the spinning is pointless.

This mostly resolves #388. To completely resolve that issue, we need to:

Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we should rerandomize using our secret key material.
Remove the existing "no context" API, along with the global-context and global-context-less-secure features.

Once we've done that, we will be much better-equipped to address #346. To do that, we should attempt to scrape together some entropy even on nostd without the rand crate. I believe we can do this by reading the system time and CPU jitter. We don't need to do a very good job for this to work; even a bit or two of entropy on each signature will BTFO an attacker attempting to learn timing information from multiple signatures.

apoelstra · 2025-06-21T00:45:40Z

cc @Kixunil @TheBlueMatt @dpc @JeremyRubin (if you still care about this) @tcharding

This PR is a bit nasty but I scoped it to "just the hard parts" and the rest of it should be cathartic API changes that Tobin and I should be able to power through on our own.

JeremyRubin · 2025-06-21T15:45:08Z

I don't particularly care anymore, but from memory:

part of what makes this confusing is in WASM contexts we explicitly want "perfect" determinism, so we don't want to give any external observables. probably the only way to do this is to hijack getrandom() and give a entropy transcript through it, and be sure we're not running multi-threaded anything, or we want to e.g. call to a host API to perform signature operations.

what we largely want to avoid is that an initialization somewhere from some far-flung library code, or by enabling a feature, means that all of the sudden we do a getrandom call for a context initialization when we're just doing verification work.

cheers,

jeremy

apoelstra · 2025-06-21T16:44:41Z

Force-pushed to update tons of unit tests, and to update the recovery API. The essential code is unchanged.

@JeremyRubin this new code only ever calls getrandom if the user enables the rand feature. Maybe we want to treat "compiling for wasm" the same as "rand not enabled". But I think we can deal with that in a followup PR.

apoelstra

On dbb164f successfully ran local tests

tcharding

Mad, I enjoyed reviewing that. Feels good to have to think hard for a change. Only problem I found was a few commas.

tcharding · 2025-06-25T00:19:01Z

src/context.rs

+    /// Borrows the global context and do some operation on it.
+    ///
+    /// If provided, after the operation is complete, [`rerandomize_global_context`]
+    /// is called on the context. If you have some random data available,


Suggested change

/// is called on the context. If you have some random data available,

/// is called on the context. If you have some random data available.

In 9330ccd and again in the previous commit for the std version.

Heh, this sentence is just totally broken. It should say "If some random data is provided, then after the operation is complete, [rerandomize_global_context] is called."

lol, hopefully one of the other ones only needed the full stop.

tcharding

ACK dbb164f

apoelstra

On dbb164f successfully ran local tests

tcharding · 2025-06-25T23:37:06Z

src/context.rs

+    /// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]
+    /// the context after the operation is complete. If it is not provided, randomization
+    /// is skipped.


lolz, looks like you've done a Tobin here and put something different in the editor to what was in your brain.

Suggested change

/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]

/// the context after the operation is complete. If it is not provided, randomization

/// is skipped.

/// If `randomize_seed` is provided, it is used to call [`rerandomize_global_context`]

/// to rerandomize the context after the operation is complete. If it is not provided,

/// randomization is skipped.

tcharding

ACK 8870a13

apoelstra

On fc83079 successfully ran local tests

apoelstra

On 8870a13 successfully ran local tests

Kixunil · 2025-06-26T13:32:10Z

Before I review this, just a note regarding entropy: I've recently learned that std is not actually needed. The getrandom crate can get entropy even on bare-metal if you provide it with a handler. For JS, it specifically has the js feature.

So what we really want is something like:

#[cfg(all("getrandom", not("std"))]
let rng = rand::rngs::OsRng;
#[cfg("std")]
let rng = rand::thread_rng();

I'm well aware that this might be slower but as I understand it this only needs to be done once, when initializing the library.

apoelstra · 2025-06-26T14:18:57Z

@Kixunil ok, awesome! I would like to improve the entropy generation in a followup PR, and still review this one as-is so we can get the synchronization logic and the API nailed down.

Great point that if we have slow sources of entropy, it's OK to do them once to get 32 bytes or so and then we can stretch it forever.

Kixunil · 2025-06-27T11:12:18Z

src/context.rs

+    /// you should provide it, even if you can't provide 32 random bytes.
+    pub fn with_global_context<T, Ctx: Context, F: FnOnce(&Secp256k1<Ctx>) -> T>(
+        f: F,
+        rerandomize_seed: Option<&[u8; 32]>,


If only a few bits are needed maybe it should be a slice, so people don't need to zero-init 31 bytes?

We can, but we need to provide 32 bytes at some point because that's what the upstream call takes.

We can petition upstream to take a pointer/length pair, since they take our data and throw it into a hash function anyway. (Though they in turn use it as a HMAC seed which is always a fixed size, which seems like overkill but ok.)

Ah, true, I think let's just keep it as is. Funny, this is the first time I see a potential for legitimate use of freeze (IMO passing stuff to readers isn't).

src/context.rs

Kixunil · 2025-06-27T11:25:13Z

src/context.rs

+        let ctx = match SECP256K1.try_lock() {
+            None => unsafe {
+                // If we can't get the lock, just do everything on the stack.
+                ffi::secp256k1_context_preallocated_create(buf, AllPreallocated::FLAGS)


Ah, interesting, we can't use our high-level API because the preallocated types are different. But I think it could be sound to treat owned type as preallocated? Basically the same as you passing &*boxed to a function expecting &T if you have Box<T>.

We can't use the high-level API because it's going away once we use the global context everywhere. So I just inlined it into this one place where it'd be used.

LOL, true, that's was a huge brain fart. :D Though having an internal thin safe wrapper could be helpful at some places.

I wound up doing this in my refactor.

Kixunil · 2025-06-27T11:29:11Z

src/context.rs

+        //        copy of the context object. This may "undo" previous rerandomization.
+        //        In theory if an attacker is able to reliably and repeatedly trigger
+        //        this situation, they will have defeated the rerandomization. Since
+        //        this is a defense-in-depth measure, we will accept this.


Shouldn't we just remember that we have a fresh copy and rerandomization is pointless?

In the current code, even if we have a fresh copy, we might replace the global one with the fresh copy in the end, so we should rerandomize it.

Maybe we shouldn't do this? I could go either way.

I mean, remember if we failed to obtain lock and don't rewrite the global context if we did.

Ok, switched to this logic (and updated the comment accordingly).

src/context.rs

Kixunil · 2025-06-27T11:33:27Z

src/key.rs

@@ -20,7 +20,8 @@ use crate::ThirtyTwoByteHash;
 #[cfg(feature = "global-context")]


Did you want to remove this?

Kixunil

This looks really promising. Regarding the new synchronization primitive, maybe we can write it as generic in a separate rs file (use type to import it) and then we should be able to run MIRI on it in a separate test (referencing the module by path) using an integer as the data.

Will make it easier to introduce submodules.

apoelstra · 2025-06-27T19:03:14Z

@Kixunil I hit "resolve" on the bulk of your comments (everything except those that have open questions that aren't related to this code). I refactored everything and basically rewrote the entire PR. The logic is the same but it will probably require a total re-review.

apoelstra · 2025-06-27T19:04:21Z

Oh, and I added miri tests on the spinlock you can run with cargo +nightly miri test spinlock.

I guess I should add those to CI. I'd like to do that in a followup or separate PR because I need to go through the whole library's unit tests and explicitly whitelist/blacklist the ones that work with MIri.

This introduces the new global context API when std is enabled, using thread locals to allow rerandomizing the context after sensitive operations. As you can see, even the simple case involves some unsafe code and is a bit tricky to implement.

Introduces a spinlocking mutex that only offers access to its internals via a "try_unlock" method which spins a small finite number of times before unlocking. We use a spinlock because, in the minimal dependency set we support, there are no synchronization primitives except atomics, so that's the only form of mutex we can create. However, there are a number of problems with spinlocks -- see this article (from Kix in rust-bitcoin#346) for some of them: https://matklad.github.io/2020/01/02/spinlocks-considered-harmful.html To avoid these problems, we give up after a few spins. The way we will use this in the context object is: 1. When initializing the global context, if we can't get the lock, we just initialize a new stack-local context and use that. (A parallel thread must be initializing the context, which is wasteful but harmless.) 2. Once we unlock the context, we copy it onto the stack and re-lock it in order to minimize the time holding the lock. (The exception is during initialization where we hold the lock for the whole initialization, in the hopes that other threads will block on us instead of doing their own initialization.) If we rerandomize, we do this on the stack-local copy and then only re-lock to copy it back. 3. If we fail to get the lock to copy the rerandomized context back, we just don't copy it. The result is that we wasted some time rerandomizing without any benefit, which is not the end of the world. The spinlock was implemented with help from ChatGPT o3 and the unit tests with help from Claude 4 (though in both cases I did significant refactoring and review by hand).

apoelstra

On d5a61b8 successfully ran local tests

See the previous commit description for a high-level overview of the spinlocking logic used in this commit. Next steps are: 1. Update the API to use this logic everywhere; on validation functions we don't need to rerandomize and on signing/keygen functions we should rerandomize using our secret key material. 2. Remove the existing "no context" API, along with the global-context and global-context-less-secure features. 3. Improve our entropy story on nostd by scraping system time or CPU jitter or something and hashing that into our rerandomization. We don't need to do a great job here -- if we can get even a bit or two per signature, that will completely BTFO a timing attacker.

…d FromStr Since we have a no-feature-gate global context now, we can remove the feature gates from these things. No API change (other than an expansion of the API for users without features enabled).

Sometihng like half the tests in this crate are gated on "rand", most of which are for dumb reasons (we are generating random keys from the thread rng). By adding a non-feature=rand "random key generator" we can enable these tests even without the rand feature. We typically also have a gate on "std", which is needed to get the thread rng, but in some cases this is the *only* reason to have a std gate. So by eliminating the rand requirement we can make tests work in nostd. We do this by implementing a parallel LCG which is obviously not cryptographic but is fine for testing. In the LLM-generated tests in musig2.rs we have some rand feature gates for literally no reason at all :/. My bad. In addition to dramatically increasing nostd test coverage, the new "generate random keys" function also gives us an opportunity to use the new global context API including rerandomization.

This updates a couple functions, and their associated unit tests (which no longer need any std/alloc/global-context feature gates). This runs clean in valgrind, providing some evidence that my new code is sound.

This API is basically unused except for some niche or legacy applications, so I feel comfortable breaking it pretty dramatically. Move all the Secp256k1 functions onto RecoverableSignature and use self/Self as appropriate. Leave the stupid ecdsa_recoverable names even though they are even more redundant, because this module is basically in maintenance mode. We only do these changes since we'll be forced to once we drop the Secp256k1 object.

Kixunil · 2025-06-27T20:49:58Z

OK, I'll try to review it tomorrow. (I'm not sure if you get notifications that I replied to some of your hidden comments; I didn't but found them by chance.)

apoelstra · 2025-06-27T20:51:51Z

Thanks! I did not. Checked them now.

In future I won't try hiding conversations like this, even when I do a total PR rewrite.

apoelstra

On 45c264e successfully ran local tests

apoelstra force-pushed the 2025-06_context branch 2 times, most recently from 879205e to 0026677 Compare June 21, 2025 14:25

apoelstra force-pushed the 2025-06_context branch from 0026677 to 7e26f33 Compare June 21, 2025 16:39

apoelstra force-pushed the 2025-06_context branch 3 times, most recently from a66b75a to dbb164f Compare June 21, 2025 22:19

apoelstra commented Jun 22, 2025

View reviewed changes

This was referenced Jun 22, 2025

update deprecation versions and bump minor version #809

Merged

musig nonce gen function should take a &[u8] not a Message #810

Closed

Context tracking meta-issue (2025) #813

Open

tcharding reviewed Jun 25, 2025

View reviewed changes

tcharding previously approved these changes Jun 25, 2025

View reviewed changes

tcharding mentioned this pull request Jun 25, 2025

Implement new global context #605

Closed

apoelstra dismissed tcharding’s stale review via d2cbb91 June 25, 2025 03:05

apoelstra force-pushed the 2025-06_context branch from dbb164f to d2cbb91 Compare June 25, 2025 03:05

apoelstra commented Jun 25, 2025

View reviewed changes

apoelstra force-pushed the 2025-06_context branch from d2cbb91 to fc83079 Compare June 25, 2025 15:02

This was referenced Jun 25, 2025

Entropy handling #814

Open

Followups to #716 (add musig2 API) #794

Merged

tcharding reviewed Jun 25, 2025

View reviewed changes

apoelstra force-pushed the 2025-06_context branch from fc83079 to 8870a13 Compare June 26, 2025 00:58

tcharding previously approved these changes Jun 26, 2025

View reviewed changes

apoelstra commented Jun 26, 2025

View reviewed changes

apoelstra dismissed tcharding’s stale review via d5a61b8 June 26, 2025 14:47

Kixunil reviewed Jun 27, 2025

View reviewed changes

src/context.rs Outdated Show resolved Hide resolved

Kixunil reviewed Jun 27, 2025

View reviewed changes

src/context.rs Outdated Show resolved Hide resolved

Kixunil reviewed Jun 27, 2025

View reviewed changes

src/context.rs Outdated Show resolved Hide resolved

Kixunil reviewed Jun 27, 2025

View reviewed changes

context: rename src/context.rs to src/context/mod.rs

6f24308

Will make it easier to introduce submodules.

apoelstra dismissed tcharding’s stale review via 672de3a June 27, 2025 18:53

apoelstra force-pushed the 2025-06_context branch 4 times, most recently from 63cb7d6 to 59bd1b8 Compare June 27, 2025 19:01

apoelstra force-pushed the 2025-06_context branch from 59bd1b8 to 4ce914b Compare June 27, 2025 19:05

apoelstra added 2 commits June 27, 2025 19:09

apoelstra force-pushed the 2025-06_context branch from 4ce914b to cdbcf6c Compare June 27, 2025 19:09

apoelstra commented Jun 27, 2025

View reviewed changes

apoelstra added 5 commits June 27, 2025 20:49

key: remove std/alloc/global-context gates from serde::deserialize an…

19946c6

…d FromStr Since we have a no-feature-gate global context now, we can remove the feature gates from these things. No API change (other than an expansion of the API for users without features enabled).

key: update a couple arbitrary API functions to no longer take a context

c5adba0

This updates a couple functions, and their associated unit tests (which no longer need any std/alloc/global-context feature gates). This runs clean in valgrind, providing some evidence that my new code is sound.

apoelstra force-pushed the 2025-06_context branch from cdbcf6c to 45c264e Compare June 27, 2025 20:49

apoelstra commented Jun 27, 2025

View reviewed changes

	/// is called on the context. If you have some random data available,
	/// is called on the context. If you have some random data available.

		@@ -20,7 +20,8 @@ use crate::ThirtyTwoByteHash;
		#[cfg(feature = "global-context")]

context: introduce new global context API with rerandomization #806

Are you sure you want to change the base?

context: introduce new global context API with rerandomization #806

Uh oh!

Conversation

apoelstra commented Jun 21, 2025

Uh oh!

apoelstra commented Jun 21, 2025

Uh oh!

JeremyRubin commented Jun 21, 2025

Uh oh!

apoelstra commented Jun 21, 2025

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

tcharding left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcharding left a comment

Choose a reason for hiding this comment

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcharding left a comment

Choose a reason for hiding this comment

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

Kixunil commented Jun 26, 2025

Uh oh!

apoelstra commented Jun 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kixunil left a comment

Choose a reason for hiding this comment

Uh oh!

apoelstra commented Jun 27, 2025

Uh oh!