Improving the Issie Fast Simulation #534

tomcl · 2025-04-06T15:27:34Z

tomcl
Apr 6, 2025
Maintainer

Currently every component (except Input and Output) is turned into a FastComponent and run to generate output data from input in the fast Simulator.

We want simulation to be as fast as possible but the big limitation is that over-large simulations use too much space. Electron has a limited (4GB) heap and also had strange limits on numeric typed array space. Maybe 8GB or 16GB.

Simulation space is determined by:

Number of fast component outputs. Each component output carries with it a fair amount of boilerplate stuff in heap: IOArray, Driver, etc. say 100 bytes.
- Note that inputs are much cheaper because they link directly to output arrays - the cost is maybe 4 bytes per input.
Simulation data storage.
- Total data used is number of outputs * Number of clock cycles stored * 4 (for outputs <= 32 bits).

Two ways to optimise the simulation are therefore:

Reduce the number of outputs
Reduce the number of clock cycles stored

Reducing clock cycles stored

Sparse data retention: using checkpoints to reduce clock cycles stored

For performant use of the waveform simulator we want to be able to go backwards and forwards on a long simulation. Keeping only a current window of samples means that forward is ok, but backwards requires resimulation from start. This is not reasonable.

A solution is to keep simulation data at checkpoints, e.g. every 100 cycles. Then to view data at any time that has previously been simulated you go back to the latest checkpoints before the data needed and simulate.

This leads to code that keeps

As now all data as arrays in a window
A new set of arrays storing data checkpoints

When changing the waveform display new simulation will be done as needed when the screen clock cycles go outside the window we keep.

Altering default clock cycle storage windows

Currently the Step simulator has a default window that is size 2000 - too large for very large simulations.
We should be able to reduce the window to say 20 if need be.
Similarly the waveform simulator has a configurable window minimum size 200 clocks. That could - with some work to disable impossible zooming out - also be reduced to 20 cycles.

Improving circuit validation and width inference time

Currently the simulation is initialised with full-size but empty arrays as part of circuit validation whenever the editor is changed and a simulation tab is open. For large simulations this is laggy.
- We could defer allocation of arrays till simulation is actually needed.
- We could see if it is possible to validate circuits without creating FastComponents at all to keep editing large design files fast.
- We could rewrite the old circuit validation code to make it more performant and decouple it entirely from simulation.
  - Now we have parameters perhaps we need to do this anyway
  - We want circuit validation to be O(design-time sheets) not O(simulation sheets)

Output compression: Reducing number of FastComponents.

The following transformation would reduce space requirements and increase speed by a significant constant factor. (4? 10?).

Change FastComponent inputs so each input specifies bit width and bit starting position within a field of data. The arrays now used as outputs can now each contain multiple outputs (of different components).
Bus Split components are now no longer needed!
Sets of registers/flip-flops with identical enable can be grouped together into parallel chunks with a single 32 bit output and implemented as a single component.
Registers with different enables can also be implemented with a common output, however then the reduction logic must specify for each enable what inputs and output fields it controls.
Combinational logic blocks with up to (say) 8 input bits and 32 output bits can be grouped together and implemented as a single table lookup.
Performance comes from:
- fewer output arrays (space)
- fewer FastComponents to reduce per clock cycle since each array lookup can implement multiple components (time)
- low width outputs can be packed multiple outputs per 32 bit word array. (space).

Actual time performance of this model is unclear because it depends on details of the JIT implementation. It could be large. However space performance gains are large and definite.

Definition of a MegaComponent

A given small number of inputs, each with linked output, separate Width, startBitPos
A given small number of outputs, each with a data array.
A Component Definition, e.g. MultiRegister, MultiRegisterE, CLB, N-M-Mux, N-M-Demux, NbitOp, GenRAM, RAM, ROM.
- make components as general as possible so we have only a few

Output compression algorithm is defined by:

Algorithm to group together related FastComponents into a smaller number of MegaComponents.
Definition of MegaComponent types. Hope to have only a few types, each parametrisable.
Defining parameters and input connections for each MegaComponent so result is equivalent to given set of FastComponents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving the Issie Fast Simulation #534

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Improving the Issie Fast Simulation #534

Uh oh!

Uh oh!

tomcl Apr 6, 2025 Maintainer

Reducing clock cycles stored

Sparse data retention: using checkpoints to reduce clock cycles stored

Altering default clock cycle storage windows

Improving circuit validation and width inference time

Output compression: Reducing number of FastComponents.

Definition of a MegaComponent

Replies: 0 comments

tomcl
Apr 6, 2025
Maintainer