feat: Create initial mempool interface #32

AndrewWestberg · 2024-12-05T15:50:47Z

This incomplete implementation of the mempool interface is opened to provide feedback and criticism for further refinement.

abailly

I would like the code to better expose how State will play with a Mempool's txs.

amaru/src/mempool/mod.rs

abailly · 2024-12-06T07:42:37Z

amaru/src/mempool/mod.rs

+
+    /// The ephemeral state, which is applied on top of the ledger_state using transactions
+    /// in the mempool.
+    ephemeral: ledger::state::VolatileDB,


I am not sure why we represent the ephemeral state as a queue: transactions can be removed arbitrarily from the mempool and being able to "reset" the set at an earlier point sequentially does not seem useful. Or perhaps I misunderstand how the VolatileDB is supposed to work?

The ephemeral state is where we apply the mempool txns on top of the existing ledger state. This is the same object that is used internally in the ledger state to keep part of it volatile and subject to rollbacks. I feel like the same thing could be utilized in a mempool. So, a LedgerState has both stable and volatile transactions. A MempoolState has a stable, volatile, and ephemeral transactions.

So, let's say a rollback occurs on-chain in the volatile section, that should trigger a rebuilding of the ephemeral transactions.

It could also be that a transaction is removed from the mempool by a user, or something is re-ordered. This should also trigger a rebuild of the ephemeral transactions.

When we go to make a block, we have a nice ordered list of transactions in the ephemeral state we can take a bite of and submit as a block.

My understanding of the purpose of this structure from my discussions with @KtorZ is that it's sequential and meant for fast and easy handling of rollbacks: Should you have a rollback to some previous transaction, you retrieve the cached state from the queue then reapply from that point on. This is useful in the context of the ledger/consensus because there might be a large number of blocks and transactions involved.

My understanding is that the behaviour of the mempool is different: It's perfectly possible that given a list of transactions say. [tx1, tx2 ... tx_10] in the mempool, you would need to remove say tx_5 and tx_8, which would mean that you would rollback to tx_4 and then need to reapply the remaining txs. Given the expected size of the mempool, and the fact you don't need to redo phase 2 validations but only phase 1 which is very fast, I would argue that this VolatileState is a premature optimisation.

Indeed; I am not sure the volatile state is best suited for storing the transient mempool. Mostly also because it's a VecDeque, which has fast access to both ends of the vector but not much to arbitrary elements.

For a mempool, we might need something with fast access to arbitrary transactions and possibly with extra metadata such as their size and execution units.

To be frank, I find the weighted RoseTree from the Haskell implementation not bad; with good amortised performances. Another option could be a weighted Patricia trie; which should work quite well with transaction ids as keys.

Given we don't really know, this is a cache and optimisation, and this might be heavily dependent on performance measures, I would leave that out for the moment and go for the simplest solution:

Keep 2 states an "anchor" and a "tip"

Every time txs are added, update the "tip"

Every time txs are removed, revalidate all txs and define new "tip"

Every time underlying chain changes, change "anchor", revalidate txs and define new "tip"

abailly · 2024-12-06T07:45:15Z

amaru/src/mempool/mod.rs

+{
+    /// The queue of transactions. The ordering of transactions are defined by the Ord trait of the
+    /// type T and left to the implementer.
+    transactions: Arc<RwLock<BTreeSet<T>>>,


It's a bit odd the comment says this is a "queue of transactions" whereas the type is a "set", but I understand the intention of using a BTreeSet here. Perhaps a type alias would help?
Also, the Arc/RwLock decorator should probably be documented somewhere, perhaps by saying the data structure is threadsafe?

Arc/RwLock does not need to be documented as this is one of the most common patterns seen in async rust code for thread safety. People typically use either Arc/Mutex or Arc/RwLock. The former causes exactly one reader or one writer to access the data at a time. The later allows for a single writer, but multiple concurrent readers. Since we will have different mempools we are maintaining at once, each of them having simultaneous read access to the transactions makes the most sense.

I'll create a type alias for the BTreeSet as that will add clarity to the code.

I wasn't clear in my comment (😂 ): I meant not to document the implementation, but the behaviour the implementation provides, eg. that transactions is thread-safe.

abailly · 2024-12-06T08:02:22Z

amaru/src/mempool/mod.rs

+    }
+}
+
+impl<T: MempoolTx<M>, M> Mempool<T, M> {


I think what would help me (and others) better review this design is to provide some kind of model embodying the expected properties and behaviour of the mempool as tests. This might strengthen the case for the data structures used or on the contrary suggests we might use something else, and would expose more clearly the interaction between mempool and other parts of the system.
For example, how does removing a transaction plays with the ledger state? When a tx is added/removed, what happens to the State? In general, what are the invariant(s) we want the mempool to guarantee b/w State and the queue of transactions it holds?
This does not require real transactions and does not need to be super involved, but I think it would help to expose potential shortcomings of the design, or validate it.

I can move forward with it and start implementing, but what I wanted to avoid by making this a Draft PR was doing a ton of work and then people criticizing the design late in the game. I wanted to get early feedback on it to make sure it's on the right track.

I agree that it would make more sense if there were an example implementation where you could see what happens on block rollbacks for example. This will be the next phase for this PR, but I wanted to make sure I'm at least somewhat in the ballpark before moving forward.

doing a ton of work and then people criticizing the design late in the game

I can relate. I would not go for a full blown implementation, but having a small model to play with and understanding the dynamics of the proposed design would help, ie. a couple of tests showing how the interfaces are used, how the state evolves, what invariants we need to maintain...

abailly · 2024-12-06T14:18:24Z

amaru/src/mempool/mod.rs

+    ephemeral: ledger::state::VolatileDB,
+}
+
+/// A transaction in the mempool with optional _M_ metadata type useful for ordering transactions.


KtorZ · 2024-12-06T15:49:42Z

amaru/src/mempool/mod.rs

+    pub fn insert(&self, tx: T) {
+        self.transactions.write().unwrap().insert(tx);
+    }
+
+    pub fn len(&self) -> usize {
+        self.transactions.read().unwrap().len()
+    }
+
+    pub fn is_empty(&self) -> bool {
+        self.transactions.read().unwrap().is_empty()
+    }
+
+    pub fn iter(&self) -> RwLockReadGuard<'_, BTreeSet<T>> {
+        self.transactions.read().unwrap()
+    }
+
+    pub fn iter_mut(&self) -> RwLockWriteGuard<'_, BTreeSet<T>> {
+        self.transactions.write().unwrap()
+    }
+
+    pub fn clear(&self) {
+        self.transactions.write().unwrap().clear();
+    }
+
+    pub fn insert_all<Iter: IntoIterator<Item = T>>(&self, txs: Iter) {
+        self.transactions.write().unwrap().extend(txs);
+    }
+
+    pub fn remove(&self, tx: &T) {
+        self.transactions.write().unwrap().remove(tx);
+    }
+
+    pub fn contains(&self, tx: &T) -> bool {
+        self.transactions.read().unwrap().contains(tx)
+    }


I think this is a lovely interface for a mempool implementor. Yet, for the block-forging interface, we might need something less granular and more tailored to forging-specific operation; otherwise we might not be able to provide a fast implementation for those specific operations.

For example, we will likely need to have the ability to get a bunch of transactions that satisfy specific criteria w.r.t block size and execution units. I don't think it should be the responsibility of the caller to decide on how that batch of transactions is made.

Thoughts?

Actually, you also need an interface to notify the pool its base state has changed.

A diagram would be useful

feat: Create initial mempool interface

4a57f81

AndrewWestberg requested review from abailly, scarmuega and KtorZ December 5, 2024 15:50

abailly reviewed Dec 6, 2024

View reviewed changes

AndrewWestberg added 2 commits December 6, 2024 14:00

fixup! feat: Create initial mempool interface

5e1ced4

fixup! fixup! feat: Create initial mempool interface

eb3ef78

abailly reviewed Dec 6, 2024

View reviewed changes

KtorZ reviewed Dec 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Create initial mempool interface #32

feat: Create initial mempool interface #32

AndrewWestberg commented Dec 5, 2024

abailly left a comment

abailly Dec 6, 2024

AndrewWestberg Dec 6, 2024

abailly Dec 6, 2024

KtorZ Dec 6, 2024

abailly Dec 6, 2024

abailly Dec 6, 2024

AndrewWestberg Dec 6, 2024

abailly Dec 6, 2024

abailly Dec 6, 2024

AndrewWestberg Dec 6, 2024

abailly Dec 6, 2024

abailly Dec 6, 2024

KtorZ Dec 6, 2024

abailly Dec 6, 2024

abailly Dec 6, 2024

feat: Create initial mempool interface #32

Are you sure you want to change the base?

feat: Create initial mempool interface #32

Conversation

AndrewWestberg commented Dec 5, 2024

abailly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment