Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev: handle correctly embedded nodes #1090

Closed
Eikix opened this issue Mar 24, 2025 · 1 comment
Closed

dev: handle correctly embedded nodes #1090

Eikix opened this issue Mar 24, 2025 · 1 comment
Assignees

Comments

@Eikix
Copy link
Member

Eikix commented Mar 24, 2025

Why

Embedded nodes are MPT nodes that are inlined inside another MPT node if their RLP encoded payload's len < 32
We want to properly handle them in our trie diff library.

This wrong behaviour has not yet caused an issue since embedded nodes seem to be rare.
See: https://github.com/kkrt-labs/keth/pull/1079/files#diff-4050cfe1abd84327a9e0b978797778a485c186572bacba9f45dea39fe3e5961aR11

What

In all the places in the codebase we have control flow that depends of the existence of embedded nodes, we wrongfully assume the embedded nodes are RLP encoded. We need to replace the rlp.decode call on the embedded nodes' sequence of bytes to a deserialization call.

How

In mpt>utils, refactor decode_node to externalise the logic to serialize RLP bytes into a InternalNode class.

def deserialize_to_internal_node(node: List[Bytes]) -> InternalNode:
    if not isinstance(node, list):
        raise ValueError(f"Unknown node structure: {type(node)}")
    if len(node) not in (2, 17):
        raise ValueError(f"Unknown node structure: {len(node)}")

    if len(node) == 17:
        return BranchNode(subnodes=tuple(node[0:16]), value=node[16])

    if len(node) == 2:
        prefix = node[0]
        value = node[1]

        nibbles = bytes_to_nibble_list(prefix)
        first_nibble = nibbles[0]
        ODD_LENGTH_PREFIX = (1, 3)

        if first_nibble in ODD_LENGTH_PREFIX:
            nibbles = nibbles[1:]
        else:
            nibbles = nibbles[2:]

        is_leaf = first_nibble in (2, 3)
        if is_leaf:
            return LeafNode(rest_of_key=nibbles, value=value)
        else:
            return ExtensionNode(key_segment=nibbles, subnode=value)


def decode_node(node: Bytes) -> InternalNode:
    decoded = rlp.decode(node)
    return deserialize_to_internal_node(decoded)

In resolve function in trie_diff replace the call to rlp.decode by a call to deserialize:

def resolve(
    node: Hash32 | Bytes | None | InternalNode, nodes: Dict[Hash32, InternalNode]
) -> InternalNode | None:
    if isinstance(node, InternalNode):
        return node
    if node is None or node == b"":
        return None
    if len(node) == 32:
        return nodes.get(node)
    return deserialize_to_internal_node(node)

In _resolve in Cairo, if any remove the RLP decode logic for embedded nodes.

@github-project-automation github-project-automation bot moved this to Backlog in Keth Mar 24, 2025
@Eikix Eikix self-assigned this Mar 24, 2025
@Eikix Eikix moved this from Backlog to In progress in Keth Mar 24, 2025
@Eikix Eikix moved this from In progress to Todo in Keth Mar 24, 2025
@Eikix
Copy link
Member Author

Eikix commented Mar 26, 2025

Closes along with #1092

@Eikix Eikix closed this as completed Mar 26, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in Keth Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant