fix(testing): Optimize fill #1804

marioevz · 2025-11-21T03:31:51Z

🗒️ Description

This PR optimizes the fill process to remove virtually all python bottlenecks and leave the t8n execution as the only time consumed.

It does so via several optimizations, but first the profiling results so we can breakdown the differences.

Profiling Results

This is the profiling results for the transition tool call during the filling of a benchmark engine-x blockchain test with the following properties:

Pre-allocation groupings generated for all benchmark tests in a single group.
All accounts from all tests are contained in a ~250MB allocation passed to the t8n in each block.

Profiling before this PR:

Profiling file (use SnakeViz to visualize): profile.before.out.tar.gz

Main time consumers:

The model_validate pydantic method, 1st orange box, mainly used to load the result from the t8n tool, which includes the full modified pre-alloc (30.4s, 58.8% of the total).

A lot of the time spent in this box is spent in our to_bytes conversion method.

The write_json_files method, 2nd orange box, used to dump the pre-allocation, environment and transactions for the t8n to read them during the state transition execution, which included dumping json objects (already dumped by pydantic) into files (8.27s, 15.81% of the total).
Evmone-t8n process run, light blue box, is the actual call to spawn the t8n process (7.73 seconds, 14.77% of the total).
The model_dump method, which is used to dump memory objects into json objects to eventually pass them to t8n tool (4.07s, 7.77% of the total).

Profiling after this PR:

Profiling file: profile.after.out.tar.gz

Main time consumers:

Evmone-t8n process run, light blue box (7.61 seconds, 95.13% of the total).
The new to_files method(0.389s, 4.86% of the total).

Optimizations

Pydantic bytes optimization

The original method we used to validate hexadecimal bytes in pydantic was too flexible and even removed white spaces in order to properly parse the string (using regex even), and this method was called on every instantiation of a Bytes type which was extremely slow.

We removed this flexibility and introduced a CoerceBytes method which can be used only when the user wants to define a hexadecimal string that is more readable.

Some other minor optimizations also were included in this, for details check commit 509d17c541.

Remove Unnecessary State Root Calculations

During the filling of a pre-allocation grouping test, we used the alloc.state_root() method to pass this into the header of the genesis of the filled test.

This method calculates the state root using python and is extremely expensive, and since the pre-allocation group is very big since it contains all accounts of all tests, this was very slow.

This PR creates GroupPreAlloc subclass of Alloc which overrides the state_root parameter to look for the pre-calculated value since the state root hash never changes.

A significant refactor was necessary for this, and for more details check f59397e81d.

`LazyAlloc` class to defer model validation

This PR introduces a new class called LazyAlloc, which allows to defer validation of the pre-alloc until it's needed in decoded format.

Most of the times, and specially for tests with many blocks, this allocation is not necessary in-between calls and the only real allocation that we use for post-comparison is the one returned by the last block (with a few exceptions).

With this in mind, and since this information is validated->dumped always in the same format, when we receive the allocation in string from the t8n, LazyAlloc keeps it as-is, and then when we pass the pre-alloc of the next block to the next t8n instance, we simply pass along the string as we got it from the t8n, skipping validation and dumping entirely.

If any test requires the allocation in validated format, LazyAlloc.get_alloc() is called which validates, caches, and returns the Alloc object.

See 164de2dee8 for details.

Cache Pydantic Model Dump of the Pre-Alloc Groups

Last optimization is to keep a cache of the result of model_dump or model_dump_json for the genesis pre-allocation groups, since it's always the same, and it's always required in the same format, only the first test in the pytest worker has to do this and it's not needed for the rest of the worker's runtime.

See fa3be8e061 for details.

The profiler

Introduces a profiler which dumps profiling information when --evm-dump-dir is passed.

This feature was used to debug and fix all the issues in this PR.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx tox -e static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

packages/testing/src/execution_testing/base_types/base_types.py

fselmo

Lgtm, just left a nit 👍🏼

marioevz · 2025-11-24T17:24:06Z

@fselmo I've pushed some other fixes to this branch but I think there's still some optimizations we can do, however those other ones do not seem so easy to pin down as the ones in this PR, so I think we can try to merge this one and then leave the rest for a follow up PR.

marioevz · 2025-11-24T21:59:39Z

@fselmo I think I had a breakthrough, I'm implementing now and I'll try to push tomorrow so please hold-off on reviewing 🙏

danceratopz · 2025-11-25T21:34:03Z

If you can wait until tomorrow, I'd love to review. I just ran into a state_root recalculation issue while working on enginex and want to check this addresses it. Brain's a bit frazzled to check it now.

fselmo

This is amazing, really nice finds @marioevz 🔥. I left a comment on a bug in a comparison check.

Something to think about... should the LazyAlloc cache its own state root so we don't decouple the two? I don't think there's a ton of risk separating them but if we use an underscored _state_root and define something like get_state_root that pulls from the cached _state_root and if it's not there we compute it? Something like this instead of tracking them separately?

Should we also add some basic unit tests or is it enough that these are always used to check if anything breaks or doesn't?

Looks great though, excited for this to get in 🚀

fselmo · 2025-11-26T22:13:41Z

packages/testing/src/execution_testing/specs/state.py

        if (
-            modified_tool_output.alloc.root.keys()
-            != modified_tool_output.alloc.root.keys()
+            modified_tool_output.alloc.get_alloc().root.keys()


These are the same. I think we need a base_tool_alloc.root.keys() here? It looks like they were the same before too... maybe this was already broken before?

Yep, it seems like it was already broken, I'll fix it in this PR, thanks for noticing this!

danceratopz

The difference in these profiling charts is incredible! I'm taking a deeper look now.

Did you generate the pre-alloc group for all tests (fill stage 1) and then profile filling single test (fill stage 2)?

Here's my process - did you do follow a similar process?

Generate the full pre-alloc group for all tests:

uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \ 
    --rootdir . \
    --generate-pre-alloc-groups \
    --evm-bin=evmone-t8n \
    -m "benchmark and blockchain_test_engine_x" 
    --output=/tmp/fixtures \
    --evm-dump-dir=/tmp/evm-debug |
    tests/benchmark/ \
    --clean
    ```

Profile via a single test:

rm -rf /evm/evm-debug/;
uv run pytest -c packages/testing/src/execution_testing/cli/pytest_commands/pytest_ini_files/pytest-fill.ini \
    --rootdir . \
    --generate-all-formats \
    --use-pre-alloc-groups \
    --evm-bin=evmone-t8n \
    -m "benchmark and blockchain_test_engine_x" \
    --output=/tmp/fixtures \
    --evm-dump-dir=/tmp/evm-debug \
    tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[fork_Prague-blockchain_test_engine_x-opcode_ADD-]

Visualize:

uvx snakeviz "$(find /tmp/evm-debug -name 'profile.out' | head -1)"

It seems a bit over the top to enable the profiler for anyone requesting traces from fill via --evm-debug-dir. How about a --profile flag for this? This is more about noise in the output directory than performance; the profile adds <10% perf overhead.

danceratopz · 2025-11-27T13:22:43Z

The optimized CI fail should be fixed by #1813

danceratopz

The speed-up is really amazing.

This PR does resolve the performance issues I saw in the enginex PR regarding pre-alloc access and state root calculation 🚀

Small suggestion in marioevz#2

packages/testing/src/execution_testing/base_types/base_types.py

packages/testing/src/execution_testing/fixtures/pre_alloc_groups.py

spencer-tb

Amazing optimization. Lets get this merged!
@danceratopz's changes look good to me as well.

One small comment from my side. We shoud consider adding some basic tests to our custom classes in the future (LazyAlloc/CoerceBytes/GroupPreAlloc).

packages/testing/src/execution_testing/client_clis/cli_types.py

spencer-tb · 2025-12-01T15:59:05Z

We should add this to the changelog as well. I think a rebase is needed for CI to pass.

fix(base_types): unnecessary cast

Co-authored-by: spencer <[email protected]>

marioevz · 2025-12-02T22:14:42Z

Added:

Unit tests
State Root
Nicer implementation using generics (I think this is the key to eventually implement an EELS-type-based LazyAlloc)

Edit: Changelog too

codecov · 2025-12-02T22:37:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.31%. Comparing base (5919c5b) to head (4f0dba8).
⚠️ Report is 1 commits behind head on forks/osaka.

Additional details and impacted files

@@             Coverage Diff              @@
##           forks/osaka    #1804   +/-   ##
============================================
  Coverage        87.31%   87.31%           
============================================
  Files              541      541           
  Lines            32832    32832           
  Branches          3015     3015           
============================================
  Hits             28668    28668           
  Misses            3557     3557           
  Partials           607      607

Flag	Coverage Δ
unittests	`87.31% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

fselmo reviewed Nov 21, 2025

View reviewed changes

packages/testing/src/execution_testing/base_types/base_types.py Outdated Show resolved Hide resolved

fselmo approved these changes Nov 21, 2025

View reviewed changes

fselmo self-requested a review November 21, 2025 17:01

marioevz force-pushed the optimize-fill branch from 864b444 to 2091245 Compare November 24, 2025 17:23

marioevz force-pushed the optimize-fill branch 2 times, most recently from 976c290 to fa3be8e Compare November 25, 2025 21:01

danceratopz self-requested a review November 25, 2025 21:15

danceratopz mentioned this pull request Nov 26, 2025

feat(consume): initial implementation of enginex danceratopz/execution-specs#1

Open

4 tasks

fselmo reviewed Nov 26, 2025

View reviewed changes

danceratopz reviewed Nov 27, 2025

View reviewed changes

This was referenced Nov 28, 2025

fix(fill): don't split slow tests into separate pre-alloc groups #1819

Draft

enhance(ci): add fast benchmark artifact workflow #1827

Merged

spencer-tb approved these changes Dec 1, 2025

View reviewed changes

packages/testing/src/execution_testing/client_clis/cli_types.py Outdated Show resolved Hide resolved

spencer-tb added C-enhance Category: an improvement or new feature P-high A-test-cli-fill Area: Tests Fill CLI—runs tests and generates fixtures (eg. `p/t/s/e/cli/pytest_commands/fill.py`) A-test-client-clis Area: execution spec tests client clis labels Dec 1, 2025

marioevz added 7 commits December 2, 2025 20:06

fix(t8n): introduce profiler

c8a059c

profiler pauses

2aee2e8

fix(testing/base_types): Speed up to_bytes

5b3dae3

fix(base_types): unnecessary cast

fix(testing/base_types): CoerceBytes

09f720f

fix(testing): Optimize pre-alloc grouping state hash

41579ed

feat(t8n): introduce LazyAlloc

f871e2e

feat(fixtures): Cached genesis model dump

8243960

danceratopz and others added 5 commits December 2, 2025 20:06

refactor: prefer explicit cache over circular reference (#2)

2ca28ca

Update packages/testing/src/execution_testing/client_clis/cli_types.py

4f2b956

Co-authored-by: spencer <[email protected]>

fix: verify modified gas limit

a8e643c

refactor: Use generics

15f8bb0

refactor: Add state root cache

f5b7167

marioevz force-pushed the optimize-fill branch from 9929c2f to f5b7167 Compare December 2, 2025 21:54

marioevz added 2 commits December 2, 2025 21:58

docs: changelog

c971a60

feat(client_clis): Add unit tests

4f0dba8

marioevz merged commit 85c6fff into ethereum:forks/osaka Dec 2, 2025
14 checks passed

marioevz deleted the optimize-fill branch December 2, 2025 23:47

fix(testing): Optimize fill #1804

fix(testing): Optimize fill #1804

Uh oh!

Conversation

marioevz commented Nov 21, 2025 • edited by spencer-tb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

Profiling Results

Optimizations

Pydantic bytes optimization

Remove Unnecessary State Root Calculations

LazyAlloc class to defer model validation

Cache Pydantic Model Dump of the Pre-Alloc Groups

The profiler

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

Uh oh!

fselmo left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz commented Nov 24, 2025

Uh oh!

marioevz commented Nov 24, 2025

Uh oh!

danceratopz commented Nov 25, 2025

Uh oh!

fselmo left a comment

Choose a reason for hiding this comment

Uh oh!

fselmo Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marioevz Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

danceratopz commented Nov 27, 2025

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spencer-tb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spencer-tb commented Dec 1, 2025

Uh oh!

marioevz commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marioevz commented Nov 21, 2025 •

edited by spencer-tb

Loading

`LazyAlloc` class to defer model validation

fselmo Nov 26, 2025 •

edited

Loading

marioevz commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading