Skip to content

smirk - merkle tree crate #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

smirk - merkle tree crate #173

wants to merge 15 commits into from

Conversation

cameron1024
Copy link
Contributor

This PR adds the smirk crate ("stable merk")

  • MerkleTree type for the tree itself
  • Storage trait and RocksdbStorage struct
  • Hashable trait (and maybe a derive macro in the future) for "things which can be RPO hashed" - if there's an existing abstraction with better ecosystem support, that would be ideal, but a quick google didn't reveal anything obvious
  • smirk! macro for easily creating trees (e.g. smirk!{ 1 => "hello", 2 => "world" })
  • some iterators

This is a WIP, but some "known issues" that I don't have good answers to are:

  • should we even try to have a generic storage backend at all? For it to be useful, we'd really need the ability to have transactions that span both the indexer, and smirk itself, which would probably require exposing some rocksdb types in the API, or we'd have to come up with our own abstraction. The latter probably isn't worth it until we start building serious support for a second backend presumably? Happy to be wrong on this though
  • similarly, do we want to be generic over the hash function? RPO seems to be pretty baked-into miden, so I assumed we wouldn't really want to use anything else 🤷
  • if we're reusing rocksdb instances from the indexer, would we need namespacing? Googling "rocksdb namespace" didn't show anything obvious - is it as simple as prefixing every key with b"smirk" and calling it a day?

Known issues that I have good answers for (basically just issues that I don't want to block reviews 😅 ):

  • the verify function is broken, the fix is a little tricky
  • the rocksdb instance won't close when dropped, which causes tests to hang indefinitely
  • a few other bits and pieces

@calummoore
Copy link
Contributor

should we even try to have a generic storage backend at all? For it to be useful, we'd really need the ability to have transactions that span both the indexer, and smirk itself, which would probably require exposing some rocksdb types in the API, or we'd have to come up with our own abstraction. The latter probably isn't worth it until we start building serious support for a second backend presumably? Happy to be wrong on this though

Agree, I don't think we need to support a generic backend, we don't have any active plans to expand beyond rocksdb for now.

similarly, do we want to be generic over the hash function? RPO seems to be pretty baked-into miden, so I assumed we wouldn't really want to use anything else

That's fine, as we only need RPO for now. We could always add another later, there are other priorities 👍

if we're reusing rocksdb instances from the indexer, would we need namespacing? Googling "rocksdb namespace" didn't show anything obvious - is it as simple as prefixing every key with b"smirk" and calling it a day?

Rollup and indexer will now be separate (may not even be on the same node) so there is no need to consider the indexer at all. We might want to store some data related to the the current committed proposal, but to be honest, given that we can now very quickly identify what proposal is committed (from the root hash), we can probably just store that separately.

the rocksdb instance won't close when dropped, which causes tests to hang indefinitely

@mateuszmlc did you see this when implementing the indexer?

@mateuszmlc
Copy link
Collaborator

did you see this when implementing the indexer?

No, I haven't experienced this.

Copy link
Contributor

@calummoore calummoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to do some basic perf benchmarks on this to see what level of throughput we can expect with the current implementation, which would guide us to whether we need to make some perf optimisations now, or we can wait until later.

K: Ord + 'static,
V: Hashable + 'static,
{
fn store_tree(&self, tree: &MerkleTree<K, V>) -> Result<(), Error>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only have the option to store the entire tree every time we want to save it? That seems like quite a performance hit. I was expecting that we'd be able to just update the changes in the tree (I think that's how merk does it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add an incremental API that allows you to specify updates and then apply them to the in-memory tree or the storage

let hash = node.value().hash();
let bytes = codec::encode(&(node.key(), node.value())).map_err(err)?;

tx.put(&hash.to_bytes(), &bytes).map_err(err)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using the hash as a key, I guess we're not storing the records in the same order as the merkle tree? It might be useful to store using the key, as that means record access will likely be closer together (i.e. when fetching stuff for updates), and we can easily split a rollup in half (when sharding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I misread the merk docs - updating to be more consistent (and indexed by key, so retrieving subtrees is fast)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add some tests to ensure the tree remains balanced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants