Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Create initial mempool interface #32

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

AndrewWestberg
Copy link
Contributor

This incomplete implementation of the mempool interface is opened to provide feedback and criticism for further refinement.

Copy link
Contributor

@abailly abailly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the code to better expose how State will play with a Mempool's txs.

amaru/src/mempool/mod.rs Show resolved Hide resolved

/// The ephemeral state, which is applied on top of the ledger_state using transactions
/// in the mempool.
ephemeral: ledger::state::VolatileDB,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we represent the ephemeral state as a queue: transactions can be removed arbitrarily from the mempool and being able to "reset" the set at an earlier point sequentially does not seem useful. Or perhaps I misunderstand how the VolatileDB is supposed to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ephemeral state is where we apply the mempool txns on top of the existing ledger state. This is the same object that is used internally in the ledger state to keep part of it volatile and subject to rollbacks. I feel like the same thing could be utilized in a mempool. So, a LedgerState has both stable and volatile transactions. A MempoolState has a stable, volatile, and ephemeral transactions.

So, let's say a rollback occurs on-chain in the volatile section, that should trigger a rebuilding of the ephemeral transactions.

It could also be that a transaction is removed from the mempool by a user, or something is re-ordered. This should also trigger a rebuild of the ephemeral transactions.

When we go to make a block, we have a nice ordered list of transactions in the ephemeral state we can take a bite of and submit as a block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the purpose of this structure from my discussions with @KtorZ is that it's sequential and meant for fast and easy handling of rollbacks: Should you have a rollback to some previous transaction, you retrieve the cached state from the queue then reapply from that point on. This is useful in the context of the ledger/consensus because there might be a large number of blocks and transactions involved.

My understanding is that the behaviour of the mempool is different: It's perfectly possible that given a list of transactions say. [tx1, tx2 ... tx_10] in the mempool, you would need to remove say tx_5 and tx_8, which would mean that you would rollback to tx_4 and then need to reapply the remaining txs. Given the expected size of the mempool, and the fact you don't need to redo phase 2 validations but only phase 1 which is very fast, I would argue that this VolatileState is a premature optimisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed; I am not sure the volatile state is best suited for storing the transient mempool. Mostly also because it's a VecDeque, which has fast access to both ends of the vector but not much to arbitrary elements.

For a mempool, we might need something with fast access to arbitrary transactions and possibly with extra metadata such as their size and execution units.

To be frank, I find the weighted RoseTree from the Haskell implementation not bad; with good amortised performances. Another option could be a weighted Patricia trie; which should work quite well with transaction ids as keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we don't really know, this is a cache and optimisation, and this might be heavily dependent on performance measures, I would leave that out for the moment and go for the simplest solution:

  • Keep 2 states an "anchor" and a "tip"
  • Every time txs are added, update the "tip"
  • Every time txs are removed, revalidate all txs and define new "tip"
  • Every time underlying chain changes, change "anchor", revalidate txs and define new "tip"

{
/// The queue of transactions. The ordering of transactions are defined by the Ord trait of the
/// type T and left to the implementer.
transactions: Arc<RwLock<BTreeSet<T>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd the comment says this is a "queue of transactions" whereas the type is a "set", but I understand the intention of using a BTreeSet here. Perhaps a type alias would help?
Also, the Arc/RwLock decorator should probably be documented somewhere, perhaps by saying the data structure is threadsafe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc/RwLock does not need to be documented as this is one of the most common patterns seen in async rust code for thread safety. People typically use either Arc/Mutex or Arc/RwLock. The former causes exactly one reader or one writer to access the data at a time. The later allows for a single writer, but multiple concurrent readers. Since we will have different mempools we are maintaining at once, each of them having simultaneous read access to the transactions makes the most sense.

I'll create a type alias for the BTreeSet as that will add clarity to the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't clear in my comment (😂 ): I meant not to document the implementation, but the behaviour the implementation provides, eg. that transactions is thread-safe.

}
}

impl<T: MempoolTx<M>, M> Mempool<T, M> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what would help me (and others) better review this design is to provide some kind of model embodying the expected properties and behaviour of the mempool as tests. This might strengthen the case for the data structures used or on the contrary suggests we might use something else, and would expose more clearly the interaction between mempool and other parts of the system.
For example, how does removing a transaction plays with the ledger state? When a tx is added/removed, what happens to the State? In general, what are the invariant(s) we want the mempool to guarantee b/w State and the queue of transactions it holds?
This does not require real transactions and does not need to be super involved, but I think it would help to expose potential shortcomings of the design, or validate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move forward with it and start implementing, but what I wanted to avoid by making this a Draft PR was doing a ton of work and then people criticizing the design late in the game. I wanted to get early feedback on it to make sure it's on the right track.

I agree that it would make more sense if there were an example implementation where you could see what happens on block rollbacks for example. This will be the next phase for this PR, but I wanted to make sure I'm at least somewhat in the ballpark before moving forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing a ton of work and then people criticizing the design late in the game

I can relate. I would not go for a full blown implementation, but having a small model to play with and understanding the dynamics of the proposed design would help, ie. a couple of tests showing how the interfaces are used, how the state evolves, what invariants we need to maintain...

ephemeral: ledger::state::VolatileDB,
}

/// A transaction in the mempool with optional _M_ metadata type useful for ordering transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines +72 to +106
pub fn insert(&self, tx: T) {
self.transactions.write().unwrap().insert(tx);
}

pub fn len(&self) -> usize {
self.transactions.read().unwrap().len()
}

pub fn is_empty(&self) -> bool {
self.transactions.read().unwrap().is_empty()
}

pub fn iter(&self) -> RwLockReadGuard<'_, BTreeSet<T>> {
self.transactions.read().unwrap()
}

pub fn iter_mut(&self) -> RwLockWriteGuard<'_, BTreeSet<T>> {
self.transactions.write().unwrap()
}

pub fn clear(&self) {
self.transactions.write().unwrap().clear();
}

pub fn insert_all<Iter: IntoIterator<Item = T>>(&self, txs: Iter) {
self.transactions.write().unwrap().extend(txs);
}

pub fn remove(&self, tx: &T) {
self.transactions.write().unwrap().remove(tx);
}

pub fn contains(&self, tx: &T) -> bool {
self.transactions.read().unwrap().contains(tx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a lovely interface for a mempool implementor. Yet, for the block-forging interface, we might need something less granular and more tailored to forging-specific operation; otherwise we might not be able to provide a fast implementation for those specific operations.

For example, we will likely need to have the ability to get a bunch of transactions that satisfy specific criteria w.r.t block size and execution units. I don't think it should be the responsibility of the caller to decide on how that batch of transactions is made.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you also need an interface to notify the pool its base state has changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A diagram would be useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants