-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create initial mempool interface #32
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the code to better expose how State
will play with a Mempool
's txs.
|
||
/// The ephemeral state, which is applied on top of the ledger_state using transactions | ||
/// in the mempool. | ||
ephemeral: ledger::state::VolatileDB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we represent the ephemeral
state as a queue: transactions can be removed arbitrarily from the mempool and being able to "reset" the set at an earlier point sequentially does not seem useful. Or perhaps I misunderstand how the VolatileDB
is supposed to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ephemeral state is where we apply the mempool txns on top of the existing ledger state. This is the same object that is used internally in the ledger state to keep part of it volatile and subject to rollbacks. I feel like the same thing could be utilized in a mempool. So, a LedgerState has both stable and volatile transactions. A MempoolState has a stable, volatile, and ephemeral transactions.
So, let's say a rollback occurs on-chain in the volatile section, that should trigger a rebuilding of the ephemeral transactions.
It could also be that a transaction is removed from the mempool by a user, or something is re-ordered. This should also trigger a rebuild of the ephemeral transactions.
When we go to make a block, we have a nice ordered list of transactions in the ephemeral state we can take a bite of and submit as a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the purpose of this structure from my discussions with @KtorZ is that it's sequential and meant for fast and easy handling of rollbacks: Should you have a rollback to some previous transaction, you retrieve the cached state from the queue then reapply from that point on. This is useful in the context of the ledger/consensus because there might be a large number of blocks and transactions involved.
My understanding is that the behaviour of the mempool is different: It's perfectly possible that given a list of transactions say. [tx1, tx2 ... tx_10]
in the mempool, you would need to remove say tx_5
and tx_8
, which would mean that you would rollback to tx_4
and then need to reapply the remaining txs. Given the expected size of the mempool, and the fact you don't need to redo phase 2 validations but only phase 1 which is very fast, I would argue that this VolatileState
is a premature optimisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed; I am not sure the volatile state is best suited for storing the transient mempool. Mostly also because it's a VecDeque, which has fast access to both ends of the vector but not much to arbitrary elements.
For a mempool, we might need something with fast access to arbitrary transactions and possibly with extra metadata such as their size and execution units.
To be frank, I find the weighted RoseTree from the Haskell implementation not bad; with good amortised performances. Another option could be a weighted Patricia trie; which should work quite well with transaction ids as keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we don't really know, this is a cache and optimisation, and this might be heavily dependent on performance measures, I would leave that out for the moment and go for the simplest solution:
- Keep 2 states an "anchor" and a "tip"
- Every time txs are added, update the "tip"
- Every time txs are removed, revalidate all txs and define new "tip"
- Every time underlying chain changes, change "anchor", revalidate txs and define new "tip"
amaru/src/mempool/mod.rs
Outdated
{ | ||
/// The queue of transactions. The ordering of transactions are defined by the Ord trait of the | ||
/// type T and left to the implementer. | ||
transactions: Arc<RwLock<BTreeSet<T>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd the comment says this is a "queue of transactions" whereas the type is a "set", but I understand the intention of using a BTreeSet
here. Perhaps a type alias would help?
Also, the Arc/RwLock
decorator should probably be documented somewhere, perhaps by saying the data structure is threadsafe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arc/RwLock
does not need to be documented as this is one of the most common patterns seen in async rust code for thread safety. People typically use either Arc/Mutex
or Arc/RwLock
. The former causes exactly one reader or one writer to access the data at a time. The later allows for a single writer, but multiple concurrent readers. Since we will have different mempools we are maintaining at once, each of them having simultaneous read access to the transactions makes the most sense.
I'll create a type alias for the BTreeSet as that will add clarity to the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't clear in my comment (😂 ): I meant not to document the implementation, but the behaviour the implementation provides, eg. that transactions
is thread-safe.
} | ||
} | ||
|
||
impl<T: MempoolTx<M>, M> Mempool<T, M> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what would help me (and others) better review this design is to provide some kind of model embodying the expected properties and behaviour of the mempool as tests. This might strengthen the case for the data structures used or on the contrary suggests we might use something else, and would expose more clearly the interaction between mempool and other parts of the system.
For example, how does removing a transaction plays with the ledger state? When a tx is added/removed, what happens to the State
? In general, what are the invariant(s) we want the mempool to guarantee b/w State and the queue of transactions it holds?
This does not require real transactions and does not need to be super involved, but I think it would help to expose potential shortcomings of the design, or validate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move forward with it and start implementing, but what I wanted to avoid by making this a Draft
PR was doing a ton of work and then people criticizing the design late in the game. I wanted to get early feedback on it to make sure it's on the right track.
I agree that it would make more sense if there were an example implementation where you could see what happens on block rollbacks for example. This will be the next phase for this PR, but I wanted to make sure I'm at least somewhat in the ballpark before moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing a ton of work and then people criticizing the design late in the game
I can relate. I would not go for a full blown implementation, but having a small model to play with and understanding the dynamics of the proposed design would help, ie. a couple of tests showing how the interfaces are used, how the state evolves, what invariants we need to maintain...
ephemeral: ledger::state::VolatileDB, | ||
} | ||
|
||
/// A transaction in the mempool with optional _M_ metadata type useful for ordering transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
pub fn insert(&self, tx: T) { | ||
self.transactions.write().unwrap().insert(tx); | ||
} | ||
|
||
pub fn len(&self) -> usize { | ||
self.transactions.read().unwrap().len() | ||
} | ||
|
||
pub fn is_empty(&self) -> bool { | ||
self.transactions.read().unwrap().is_empty() | ||
} | ||
|
||
pub fn iter(&self) -> RwLockReadGuard<'_, BTreeSet<T>> { | ||
self.transactions.read().unwrap() | ||
} | ||
|
||
pub fn iter_mut(&self) -> RwLockWriteGuard<'_, BTreeSet<T>> { | ||
self.transactions.write().unwrap() | ||
} | ||
|
||
pub fn clear(&self) { | ||
self.transactions.write().unwrap().clear(); | ||
} | ||
|
||
pub fn insert_all<Iter: IntoIterator<Item = T>>(&self, txs: Iter) { | ||
self.transactions.write().unwrap().extend(txs); | ||
} | ||
|
||
pub fn remove(&self, tx: &T) { | ||
self.transactions.write().unwrap().remove(tx); | ||
} | ||
|
||
pub fn contains(&self, tx: &T) -> bool { | ||
self.transactions.read().unwrap().contains(tx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a lovely interface for a mempool implementor. Yet, for the block-forging interface, we might need something less granular and more tailored to forging-specific operation; otherwise we might not be able to provide a fast implementation for those specific operations.
For example, we will likely need to have the ability to get a bunch of transactions that satisfy specific criteria w.r.t block size and execution units. I don't think it should be the responsibility of the caller to decide on how that batch of transactions is made.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you also need an interface to notify the pool its base state has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A diagram would be useful
This incomplete implementation of the mempool interface is opened to provide feedback and criticism for further refinement.