Reduce par
threads when latency penalty is known
#1833
Labels
C: calyx-opt
Optimization or analysis pass
C: static-cleanup
Cleanup for Static Calyx Project
S: Available
Can be worked upon
When
tdcc
compiles apar
block, it allocates a new FSM for each thread:Each sub-program gets its own independent FSM to ensure that threads can make progress independently. However, sometime, we can have programs that look like this:
In this case,
upd_reg
takes 1 cycle and the loop may take thousands. Regardless, we still allocate a whole new FSM forupd_reg
. It would be better to just transform this into aseq
instead and use exactly one FSM. The challenge is that, in general, we don't know how long a "simple looking" control program will take; after all, a loop is compiled into a group at some point.Instead, we should use the newly added
@promote_static(n)
attribute to detect when a group (which wasn't upgraded to a static island), takes a small fraction of the cycle-time of the other threads and instead sequence it with one of the threads. We can expose a compiler knob to decide what the exactly fraction should be but the upshot is that this will enable us to reduce the number of FSMs we allocate.It also occurs to me that the static inlining pass should annotate the generated group with a
@promote_static(n)
attribute so that this information can be used to reschedulepar
threads.The text was updated successfully, but these errors were encountered: