Skip to content

perf(DON'T MERGE): [WIP] execution and tracegen rewrite #1567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

jonathanpwang
Copy link
Contributor

To be rebase merged

Before merging to main: make sure all TODO and TEMP have been removed and addressed.

jonathanpwang and others added 12 commits May 2, 2025 10:58
Note: this PR is not targeting `main`.
I've used `TODO` and `TEMP` to mark places in code that will need to be
cleaned up before merging to `main`.

Beginning the refactor of online memory to allow different host types in
different address spaces.
Going to touch a lot of APIs.
Focusing on stabilizing APIs - currently this PR will not improve
performance.

Tests will not all pass because I have intentionally disabled some
logging required for trace generation.
Only execution tests will pass (or run the execute benchmark).

In future PR(s):

- [ ] make `Memory` trait for execution read/write API
- [ ] better handling of type conversions for memory image
- [ ] replace the underlying memory implementation with other
implementations like mmap

Towards INT-3743

Even with wasteful conversions, execution is faster:
Before: https://github.com/openvm-org/openvm/actions/runs/14318675080
After:
https://github.com/openvm-org/openvm/actions/runs/14371335248?pr=1559
Not merging to main

Add `GuestMemory` trait and implement for `AddressMap`. We are moving
more towards a trait based style to re-use code when different types of
memory might be swapped out.
- make `VmSegmentExecutor` generic on `<Mem, Ctx, Ctrl>` where:
  - `Mem`: struct that implements `GuestMemory`
  - `Ctx`: struct that stores host context during execution
- `Ctrl`: struct that implements pre/post segment execution hooks,
termination condition and instruction execution logic
- add `TracegenVmSegmentExecutor` that implements the current execution
flow
- move segmentation strategies to new file
- deleting `Vm{Adapter,Core}Chip` traits
- no more records, directly use trace buffer
- jal_lui chip is a demonstration of the new changes with working unit
tests
- changed unit tester

- [x] need to add some dummy volatile memory to the tester to balance
based on touched addresses
…1590)

- introduce a new generic `InsExecutorE1` trait 
- add `InsExecutor::execute_e1` for rv32im instructions
- fix some loadstore tests
- remove records
- wrap unsafe memory read/writes into safe wrappers

---------

Co-authored-by: Jonathan Wang <[email protected]>
closes INT-3839

---------

Co-authored-by: Ayush Shukla <[email protected]>
- make `Rv32HintStoreChip` use the `NewVmChipWrapper`
- rename `SingleTraceStep` to `TraceStep` and update it to work for
chips whose execution creates multiple trace rows
- comment out criterion execute benchmarks for now
one line fix. now that we're only initializing `TracingMemory` with
`new`, we should remove this line from `with_image`
jonathanpwang and others added 9 commits May 2, 2025 22:52
remove `memory/offline.rs` as we aren't using it anymore.

Delete `VmAdapterChip` trait and `VmChipWrapper` since we also aren't
using them anymore.
Made the rv32im tests pass and made all the testing files to have the
same testing interface.
Deleted the `test_adapter`. Kept all the test cases unchanged. The only
commented test case remaining is the `store` test to the address space
4, which is failing because currently memory accesses with block size 4
are not supported with the address space 4.

All the test files have 3 types of tests: Positive, Negative, and Sanity
tests.
All the test files have 2 helper functions: `create_test_chip`,
`set_and_execute`.

An important thing to notice about negative tests when expecting an
interaction fail (aka ChallangePhase error) is that ther might be an
imbalance created for the wrong reasons. For example, there might be an
imbalance on the range checker bus created by the interactions:
[send 1] (sent from the chip_air)
[receive 2] (the execution did `add_count(2)` at some point)
This is not a "valid" fail since 1 is still in the range of the range
checker. Because of this a manual check is needed for all the negative
checks. To see all the imbalances occurred during a test remove the
'disable_debug_builder();' line from the `run_negative_test` function
and run the test. I am 95% sure that I wen through all the negative
tests and checked that the imbalances occurred are correct.

The `test_adapter` tried to address this issue by getting rid of
interaction imbalances on the memory bus. But even with the `test
_adapter` a manual check was necessary.
To solve this I suggest that we somehow keep all the interactions that
occur during the test and automatically check that actually an invalid
interaction has happened on a specified bus.

Resolves INT-3975

---------

Co-authored-by: Ayush Shukla <[email protected]>
Fixed an error in divrem negative tests. The trace pranking was done
incorrectly. 2 instructions were being called (so the trace had height
2) each time but only one of the rows was being modified. Changed it so
only one instruction is called each time
Also, made the setup_tracing the default
Implemented e1 and e3 for HeapBranch, Heap, and VecHeap adapters.
Updated the Bigint circuit correspondingly. Had to make some changes in
the interfaces of rv32im Steps. In particular

- Changed Reads type `([u8; N], [u8; N])` into `Into<[[u8;N];2]>` and
Writes type `[u8; N]` into `From<[[u8;N];1]>`. This change corresponds
to what we used to do with the previous integration API in order to make
the interfaces to match.

- Got rid of TraceAdapterContext in a lot of places. This is because the
same Step can be using different AdapterSteps that require different
TraceContexts. Or even the AdapterStep might require a `TraceContext`
that the Step doesn't have. The easy solution was to implement
AdapterSteps in a similar way as in the previous integration API. That
is, added the necessary fields to the AdapterStep structs. I am thinking
maybe deleting the `TraceContext` from the interface makes sense. I am
not sure if there is a better way to do this

Important Note: the tests don't run right now because a lot of the
read/write operations are done in address space 2 with block size 32 but
currently only block size 4 is supported by the memory.

Resolves INT-3980
Resolves INT-3801.

- Added memory access adapters. To improve:
* Allocate the trace buffer once before filling it as opposed to pushing
to `Vec` how it's done now,
* Maybe not call `get_f` too often (although I don't know how to avoid
it normally).
- Added volatile and persistent boundary chips tracegen,
- Added merkle chip tracegen as described
[here](https://docs.google.com/document/d/12cH7ZYRFWHgflpPzOILb7bg5XExdyWOL4vwrQ9HFGkQ/edit?tab=t.0#heading=h.hrg0oexxgu9).
To improve:
  * Parallelize at least something,
  * Maybe support passing this struct between segments.
- `VmChipTestBuilder` now has `::default_persistent`, so all tests in
`extensions/rv32im/circuit` pass both with volatile and persistent
memory interface.
`cargo` complains that `uuid` has a conflict checksum.
I used to handle creating new blocks in a wrong way when `align >
initial_block_size`, now I hopefully do it right.
Also added persistent base alu tests, although nothing changed for the
persistent case,
and added a dummy access in all of them that used to fail.
Copy link

codspeed-hq bot commented May 15, 2025

CodSpeed Instrumentation Performance Report

Merging #1567 will not alter performance

Comparing feat/new-execution (e9bc13d) with feat/new-execution (cc62f86)

Summary

✅ 10 untouched benchmarks

Golovanov399 and others added 3 commits May 15, 2025 10:37
This resolves INT-4012 by not using memory controller's memory in E1
execution.
implemented e1 and e3 for `VecHeapTwoReads` and `eq_mod` rv32 adapters.
Implemented e1 and e3 for mod-builder. Updated the `algebra` and `ecc` extensions accordingly.
Deleted all the pairing chips
All the tests successfully run. Also, added back the address space 4 loadstore tests.

Resolves INT-3914
- add codspeed walltime measurement job
- tweak execution benchmarks to be heavier and more representative
Copy link

codspeed-hq bot commented May 15, 2025

CodSpeed Walltime Performance Report

Merging #1567 will not alter performance

Comparing feat/new-execution (e9bc13d) with feat/new-execution (cc62f86)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 10 untouched benchmarks

…sts) (#1659)

This resolves INT-3913.

As a _side effect_, this removes `GuestMemory` trait -- it is a struct
now with underlying `AddressMap<PAGE_SIZE>` (I didn't make `type
GuestMemory = ...` because the vaguely called `read` and `write` methods
would be too vaguely called for `AddressMap`). `VmStateMut` is generic
over `MEM` though.

I didn't fully implement `TraceStep` and `StepExecutorE1` for the
phantom chip because the chip is relatively easy and I'm not sure it
would be better expressible in terms of `NewVmChipWrapper`.

`PhantomSubExecutor` also changed a little (now accepts `u32` instead of
`F`, for example, and also `GuestMemory` instead of what it needed
before).
arayikhalatyan and others added 15 commits May 19, 2025 23:50
Implemented e1, e3 for sha2 and keccak extensions. I feel like the trace
generation of sha2 is more readable now.
Also, implemented an arbitrary length read function (used in both e1
executions)

I think eventually we should reimplement Plonky3's trace generation for
Keccak so we don't have to allocate a separate trace matrix for the perm
cols.

TODOs:

- port the tests of keccak to the new framework
- spend some time (not much) on optimizing the keccak trace gen 

Resolves INT-3966
Resolves INT-3921
- Added a `execute_metered` function that keeps a track of
`trace_heights` during execution and suspends execution if the trace
height, number of cells or number of interactions goes above a fixed
threshold

Resolves INT-3752
We removed support for memory read/writes that aren't 4-byte aligned
(except risc-v loadh, loadb), so now the keccak guest binding must
handle the input misalignment cases
Fixes misaligned calls to sha256. 
Moved Aligned_buf to openvm platform so both `sha2` and `keccak` can access it.
All the integration tests should pass now. Merged with Native's PR so
some of the tests could compile, I think for simplicity this can be
merged after the native's PR.

---------

Co-authored-by: Ayush Shukla <[email protected]>
Co-authored-by: Jonathan Wang <[email protected]>
…ctions (#1677)

- this is targeting `feat/new-exec-native-ext` since it was built on top
of it
Fixed todos related to porting public values chip and native adapter.
I'm guessing these are used in recursion

---------

Co-authored-by: Arayi Khalatyan <[email protected]>
Bumped to v1.0.1 (same as main)
- added a missing pc increment step in `ModularIsEqualStep`
- total cells = max_cells_per_chip * total_chips
Rewrite native Poseidon2 chip for execution/tacegen.

---------

Co-authored-by: Alexander Golovanov <[email protected]>

This comment has been minimized.

We sometimes call `set_initial_memory` even with `Volatile` interface
chip. Don't know if this alone is a good design, but inside this
function we reset the tracing memory to use 8 as `initial_block_size`
and then checked that in the volatile case the memory is empty, but the
initial block size was already overwritten. This change fixes it.
Copy link

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair (-97 [-8.6%]) 1,035 334,067 (-1223534 [-6.9%]) 16,452,228 - - -
fibonacci (-295 [-11.9%]) 2,176 1,500,277 50,578,543 - - -
regex (-672 [-9.1%]) 6,705 4,165,432 (-3513918 [-2.1%]) 162,997,234 - - -
ecrecover (+70 [+5.0%]) 1,458 289,547 (-1180984 [-8.2%]) 13,289,202 - - -
pairing (-199 [-4.4%]) 4,360 1,820,436 (-16223094 [-16.9%]) 79,609,313 - - -

Commit: e9bc13d

Benchmark Workflow

@yi-sun yi-sun added run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e labels May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark triggers benchmark workflows on the pr run-benchmark-e2e
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants