You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently every component (except Input and Output) is turned into a FastComponent and run to generate output data from input in the fast Simulator.
We want simulation to be as fast as possible but the big limitation is that over-large simulations use too much space. Electron has a limited (4GB) heap and also had strange limits on numeric typed array space. Maybe 8GB or 16GB.
Simulation space is determined by:
Number of fast component outputs. Each component output carries with it a fair amount of boilerplate stuff in heap: IOArray, Driver, etc. say 100 bytes.
Note that inputs are much cheaper because they link directly to output arrays - the cost is maybe 4 bytes per input.
Simulation data storage.
Total data used is number of outputs * Number of clock cycles stored * 4 (for outputs <= 32 bits).
Two ways to optimise the simulation are therefore:
Reduce the number of outputs
Reduce the number of clock cycles stored
Reducing clock cycles stored
Sparse data retention: using checkpoints to reduce clock cycles stored
For performant use of the waveform simulator we want to be able to go backwards and forwards on a long simulation. Keeping only a current window of samples means that forward is ok, but backwards requires resimulation from start. This is not reasonable.
A solution is to keep simulation data at checkpoints, e.g. every 100 cycles. Then to view data at any time that has previously been simulated you go back to the latest checkpoints before the data needed and simulate.
This leads to code that keeps
As now all data as arrays in a window
A new set of arrays storing data checkpoints
When changing the waveform display new simulation will be done as needed when the screen clock cycles go outside the window we keep.
Altering default clock cycle storage windows
Currently the Step simulator has a default window that is size 2000 - too large for very large simulations.
We should be able to reduce the window to say 20 if need be.
Similarly the waveform simulator has a configurable window minimum size 200 clocks. That could - with some work to disable impossible zooming out - also be reduced to 20 cycles.
Improving circuit validation and width inference time
Currently the simulation is initialised with full-size but empty arrays as part of circuit validation whenever the editor is changed and a simulation tab is open. For large simulations this is laggy.
We could defer allocation of arrays till simulation is actually needed.
We could see if it is possible to validate circuits without creating FastComponents at all to keep editing large design files fast.
We could rewrite the old circuit validation code to make it more performant and decouple it entirely from simulation.
Now we have parameters perhaps we need to do this anyway
We want circuit validation to be O(design-time sheets) not O(simulation sheets)
Output compression: Reducing number of FastComponents.
The following transformation would reduce space requirements and increase speed by a significant constant factor. (4? 10?).
Change FastComponent inputs so each input specifies bit width and bit starting position within a field of data. The arrays now used as outputs can now each contain multiple outputs (of different components).
Bus Split components are now no longer needed!
Sets of registers/flip-flops with identical enable can be grouped together into parallel chunks with a single 32 bit output and implemented as a single component.
Registers with different enables can also be implemented with a common output, however then the reduction logic must specify for each enable what inputs and output fields it controls.
Combinational logic blocks with up to (say) 8 input bits and 32 output bits can be grouped together and implemented as a single table lookup.
Performance comes from:
fewer output arrays (space)
fewer FastComponents to reduce per clock cycle since each array lookup can implement multiple components (time)
low width outputs can be packed multiple outputs per 32 bit word array. (space).
Actual time performance of this model is unclear because it depends on details of the JIT implementation. It could be large. However space performance gains are large and definite.
Definition of a MegaComponent
A given small number of inputs, each with linked output, separate Width, startBitPos
A given small number of outputs, each with a data array.
A Component Definition, e.g. MultiRegister, MultiRegisterE, CLB, N-M-Mux, N-M-Demux, NbitOp, GenRAM, RAM, ROM.
make components as general as possible so we have only a few
Output compression algorithm is defined by:
Algorithm to group together related FastComponents into a smaller number of MegaComponents.
Definition of MegaComponent types. Hope to have only a few types, each parametrisable.
Defining parameters and input connections for each MegaComponent so result is equivalent to given set of FastComponents.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently every component (except Input and Output) is turned into a FastComponent and run to generate output data from input in the fast Simulator.
We want simulation to be as fast as possible but the big limitation is that over-large simulations use too much space. Electron has a limited (4GB) heap and also had strange limits on numeric typed array space. Maybe 8GB or 16GB.
Simulation space is determined by:
Two ways to optimise the simulation are therefore:
Reducing clock cycles stored
Sparse data retention: using checkpoints to reduce clock cycles stored
For performant use of the waveform simulator we want to be able to go backwards and forwards on a long simulation. Keeping only a current window of samples means that forward is ok, but backwards requires resimulation from start. This is not reasonable.
A solution is to keep simulation data at checkpoints, e.g. every 100 cycles. Then to view data at any time that has previously been simulated you go back to the latest checkpoints before the data needed and simulate.
This leads to code that keeps
When changing the waveform display new simulation will be done as needed when the screen clock cycles go outside the window we keep.
Altering default clock cycle storage windows
Improving circuit validation and width inference time
Output compression: Reducing number of FastComponents.
The following transformation would reduce space requirements and increase speed by a significant constant factor. (4? 10?).
Actual time performance of this model is unclear because it depends on details of the JIT implementation. It could be large. However space performance gains are large and definite.
Definition of a MegaComponent
Output compression algorithm is defined by:
Beta Was this translation helpful? Give feedback.
All reactions