Skip to content

Conversation

@marioevz
Copy link
Member

@marioevz marioevz commented Nov 21, 2025

🗒️ Description

This PR optimizes the fill process to remove virtually all python bottlenecks and leave the t8n execution as the only time consumed.

It does so via several optimizations, but first the profiling results so we can breakdown the differences.

Profiling Results

This is the profiling results for the transition tool call during the filling of a benchmark engine-x blockchain test with the following properties:

  • Pre-allocation groupings generated for all benchmark tests in a single group.
  • All accounts from all tests are contained in a ~250MB allocation passed to the t8n in each block.

Profiling before this PR:
Screenshot 2025-11-25 at 18 01 59
Profiling file (use SnakeViz to visualize): profile.before.out.tar.gz

Main time consumers:

  1. The model_validate pydantic method, 1st orange box, mainly used to load the result from the t8n tool, which includes the full modified pre-alloc (30.4s, 58.8% of the total).
  • A lot of the time spent in this box is spent in our to_bytes conversion method.
  1. The write_json_files method, 2nd orange box, used to dump the pre-allocation, environment and transactions for the t8n to read them during the state transition execution, which included dumping json objects (already dumped by pydantic) into files (8.27s, 15.81% of the total).
  2. Evmone-t8n process run, light blue box, is the actual call to spawn the t8n process (7.73 seconds, 14.77% of the total).
  3. The model_dump method, which is used to dump memory objects into json objects to eventually pass them to t8n tool (4.07s, 7.77% of the total).

Profiling after this PR:
Screenshot 2025-11-25 at 18 02 10
Profiling file: profile.after.out.tar.gz

Main time consumers:

  1. Evmone-t8n process run, light blue box (7.61 seconds, 95.13% of the total).
  2. The new to_files method(0.389s, 4.86% of the total).

Optimizations

Pydantic bytes optimization

The original method we used to validate hexadecimal bytes in pydantic was too flexible and even removed white spaces in order to properly parse the string (using regex even), and this method was called on every instantiation of a Bytes type which was extremely slow.

We removed this flexibility and introduced a CoerceBytes method which can be used only when the user wants to define a hexadecimal string that is more readable.

Some other minor optimizations also were included in this, for details check commit 509d17c541.

Remove Unnecessary State Root Calculations

During the filling of a pre-allocation grouping test, we used the alloc.state_root() method to pass this into the header of the genesis of the filled test.

This method calculates the state root using python and is extremely expensive, and since the pre-allocation group is very big since it contains all accounts of all tests, this was very slow.

This PR creates GroupPreAlloc subclass of Alloc which overrides the state_root parameter to look for the pre-calculated value since the state root hash never changes.

A significant refactor was necessary for this, and for more details check f59397e81d.

LazyAlloc class to defer model validation

This PR introduces a new class called LazyAlloc, which allows to defer validation of the pre-alloc until it's needed in decoded format.

Most of the times, and specially for tests with many blocks, this allocation is not necessary in-between calls and the only real allocation that we use for post-comparison is the one returned by the last block (with a few exceptions).

With this in mind, and since this information is validated->dumped always in the same format, when we receive the allocation in string from the t8n, LazyAlloc keeps it as-is, and then when we pass the pre-alloc of the next block to the next t8n instance, we simply pass along the string as we got it from the t8n, skipping validation and dumping entirely.

If any test requires the allocation in validated format, LazyAlloc.get_alloc() is called which validates, caches, and returns the Alloc object.

See 164de2dee8 for details.

Cache Pydantic Model Dump of the Pre-Alloc Groups

Last optimization is to keep a cache of the result of model_dump or model_dump_json for the genesis pre-allocation groups, since it's always the same, and it's always required in the same format, only the first test in the pytest worker has to do this and it's not needed for the rest of the worker's runtime.

See fa3be8e061 for details.

The profiler

Introduces a profiler which dumps profiling information when --evm-dump-dir is passed.

This feature was used to debug and fix all the issues in this PR.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, just left a nit 👍🏼

@fselmo fselmo self-requested a review November 21, 2025 17:01
@marioevz
Copy link
Member Author

@fselmo I've pushed some other fixes to this branch but I think there's still some optimizations we can do, however those other ones do not seem so easy to pin down as the ones in this PR, so I think we can try to merge this one and then leave the rest for a follow up PR.

@marioevz
Copy link
Member Author

@fselmo I think I had a breakthrough, I'm implementing now and I'll try to push tomorrow so please hold-off on reviewing 🙏

@marioevz marioevz force-pushed the optimize-fill branch 2 times, most recently from 976c290 to fa3be8e Compare November 25, 2025 21:01
@danceratopz danceratopz self-requested a review November 25, 2025 21:15
@danceratopz
Copy link
Member

If you can wait until tomorrow, I'd love to review. I just ran into a state_root recalculation issue while working on enginex and want to check this addresses it. Brain's a bit frazzled to check it now.

Copy link
Contributor

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, really nice finds @marioevz 🔥. I left a comment on a bug in a comparison check.

Something to think about... should the LazyAlloc cache its own state root so we don't decouple the two? I don't think there's a ton of risk separating them but if we use an underscored _state_root and define something like get_state_root that pulls from the cached _state_root and if it's not there we compute it? Something like this instead of tracking them separately?

Should we also add some basic unit tests or is it enough that these are always used to check if anything breaks or doesn't?

Looks great though, excited for this to get in 🚀

if (
modified_tool_output.alloc.root.keys()
!= modified_tool_output.alloc.root.keys()
modified_tool_output.alloc.get_alloc().root.keys()
Copy link
Contributor

@fselmo fselmo Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same. I think we need a base_tool_alloc.root.keys() here? It looks like they were the same before too... maybe this was already broken before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it seems like it was already broken, I'll fix it in this PR, thanks for noticing this!

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in these profiling charts is incredible! I'm taking a deeper look now.

Did you generate the pre-alloc group for all tests (fill stage 1) and then profile filling single test (fill stage 2)?

Here's my process - did you do follow a similar process?

  1. Generate the full pre-alloc group for all tests:
    uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \ 
        --rootdir . \
        --generate-pre-alloc-groups \
        --evm-bin=evmone-t8n \
        -m "benchmark and blockchain_test_engine_x" 
        --output=/tmp/fixtures \
        --evm-dump-dir=/tmp/evm-debug |
        tests/benchmark/ \
        --clean
        ```
  2. Profile via a single test:
    rm -rf /evm/evm-debug/;
    uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \
        --rootdir . \
        --generate-all-formats \
        --use-pre-alloc-groups \
        --evm-bin=evmone-t8n \
        -m "benchmark and blockchain_test_engine_x" \
        --output=/tmp/fixtures \
        --evm-dump-dir=/tmp/evm-debug \
        tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[fork_Prague-blockchain_test_engine_x-opcode_ADD-]
  3. Visualize:
    uvx snakeviz "$(find /tmp/evm-debug -name 'profile.out' | head -1)"
    

It seems a bit over the top to enable the profiler for anyone requesting traces from fill via --evm-debug-dir. How about a --profile flag for this? This is more about noise in the output directory than performance; the profile adds <10% perf overhead.

@danceratopz
Copy link
Member

The optimized CI fail should be fixed by #1813

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The speed-up is really amazing.

This PR does resolve the performance issues I saw in the enginex PR regarding pre-alloc access and state root calculation 🚀

Small suggestion in marioevz#2

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing optimization. Lets get this merged!
@danceratopz's changes look good to me as well.

One small comment from my side. We shoud consider adding some basic tests to our custom classes in the future (LazyAlloc/CoerceBytes/GroupPreAlloc).

@spencer-tb spencer-tb added C-enhance Category: an improvement or new feature P-high A-test-cli-fill Area: Tests Fill CLI—runs tests and generates fixtures (eg. `p/t/s/e/cli/pytest_commands/fill.py`) A-test-client-clis Area: execution spec tests client clis labels Dec 1, 2025
@spencer-tb
Copy link
Contributor

We should add this to the changelog as well. I think a rebase is needed for CI to pass.

@marioevz
Copy link
Member Author

marioevz commented Dec 2, 2025

Added:

  • Unit tests
  • State Root
  • Nicer implementation using generics (I think this is the key to eventually implement an EELS-type-based LazyAlloc)

Edit: Changelog too

@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.31%. Comparing base (5919c5b) to head (4f0dba8).
⚠️ Report is 1 commits behind head on forks/osaka.

Additional details and impacted files
@@             Coverage Diff              @@
##           forks/osaka    #1804   +/-   ##
============================================
  Coverage        87.31%   87.31%           
============================================
  Files              541      541           
  Lines            32832    32832           
  Branches          3015     3015           
============================================
  Hits             28668    28668           
  Misses            3557     3557           
  Partials           607      607           
Flag Coverage Δ
unittests 87.31% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@marioevz marioevz merged commit 85c6fff into ethereum:forks/osaka Dec 2, 2025
14 checks passed
@marioevz marioevz deleted the optimize-fill branch December 2, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-cli-fill Area: Tests Fill CLI—runs tests and generates fixtures (eg. `p/t/s/e/cli/pytest_commands/fill.py`) A-test-client-clis Area: execution spec tests client clis C-enhance Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants