-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Feat] Ensure sequential processing of potential Ledger writes #2975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
[Feat] Ensure sequential processing of potential Ledger writes #2975
Conversation
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
…hread Signed-off-by: ljedrz <[email protected]>
Signed-off-by: ljedrz <[email protected]>
a3fedfb to
06cc701
Compare
Signed-off-by: ljedrz <[email protected]>
|
|
||
| // Run each test and compare it against its corresponding expectation. | ||
| tests.par_iter().for_each(|test| { | ||
| tests.iter().for_each(|test| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this doesn't seem to slow down the related CI job at all
| ) -> thread::JoinHandle<()> { | ||
| // Spawn a dedicated thread. | ||
| let vm = self.clone(); | ||
| thread::spawn(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if this code is run in WASM? I am unsure if you can spawn threads there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know of a wasm_thread crate, though I haven't used it yet; an alternative would be to use a blocking task, but then the plumbing would need to be moved to snarkOS; good point, I'll think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking of us Kai! We don't use VM object in wasm :). Most wasm usage is strictly for execution or using individual pieces of tooling in SnarkVM so I wouldn't worry about the wasm target at all if you're dealing with changes that ONLY affect the VM object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is only related to the VM methods.
| /// A safeguard used to ensure that the given operation is processed in the thread | ||
| /// enforcing sequential processing of operations. | ||
| pub fn ensure_sequential_processing(&self) { | ||
| assert_eq!(thread::current().id(), self.sequential_ops_thread.lock().as_ref().unwrap().thread().id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that one of the goals of this PR is to avoid assertions like this.
Is there no straightforward way to ensure certain functions are only invoked from within the worker thread by leveraging the type system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, such safeguards are suboptimal, even if more foolproof than before; as I noted in the PR description, I see no simple ways of introducing type safety here right now, but it would most likely be a heavy lift.
| let _ = tx.send(request); | ||
|
|
||
| // Wait for the result of the queued operation. | ||
| let Ok(response) = response_rx.blocking_recv() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This panics when called from an async context. We need to document that somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is expected; in production conditions we always run these operations within the context of blocking tasks. I'm not sure where to document it, though, other than perhaps the general storage documentation. That being said, I made a note on this at the callsite for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would need to document that operations that use it. For example, I had check_next_block panic. That should have been in a blocking task anyway, but it still a fairly easy mistake to make.
Signed-off-by: ljedrz <[email protected]>
… from async Signed-off-by: ljedrz <[email protected]>
…s called from async" This reverts commit f53027c.
This is a proposal tackling #2954. It introduces a background thread responsible for the sequential processing of storage-related operations that cannot happen concurrently, in order to allow us to remove some of the related locks, to be able to introduce more performance optimizations related to block syncing, and in general be able to reason about the storage with more confidence.
During prototyping, the following open questions were resolved:
I have found only one more operation not listed in the issue, which is only triggered when a dev-mode node creates a genesis block. That being said, it followed one of the call paths detected previously, so it required no special treatment.
In the end I decided to isolate 2
SequentialOperations:AddNextBlockAtomicSpeculateThere is also
atomic_finalizewhich may not be triggered concurrently, but it is only called as part ofAddNextBlock, so I decided to only introduce a new safeguard (ensure_sequential_processing) in it.It's a new
enum(plus some helper objects), and in the end it doesn't seem too bad size-wise; as for the channel, it actually contains a shallow clone of theVMinstead, as that's the actual entry point into the storage.I decided to spawn the dedicated thread when creating the
VM- it is owned by it, together with the sender to the channel for transferring the operations.Future/follow-up considerations:
VM, and it could be the one to perform theensure_sequential_processingcheck, probably instart_atomic; it would future-proof this setup, as we would immediately know if any new write batch was introduced, but not being processed sequentiallyFiling this as a draft, as I haven't run all the tests locally yet, and it would require a stress-test run on snarkOS side.