Reduce `par` threads when latency penalty is known #1833

rachitnigam · 2023-12-27T00:02:06Z

When tdcc compiles a par block, it allocates a new FSM for each thread:

par { A; B; C; }

Each sub-program gets its own independent FSM to ensure that threads can make progress independently. However, sometime, we can have programs that look like this:

par {
  while cond { B };
  upd_reg;
}

In this case, upd_reg takes 1 cycle and the loop may take thousands. Regardless, we still allocate a whole new FSM for upd_reg. It would be better to just transform this into a seq instead and use exactly one FSM. The challenge is that, in general, we don't know how long a "simple looking" control program will take; after all, a loop is compiled into a group at some point.

Instead, we should use the newly added @promote_static(n) attribute to detect when a group (which wasn't upgraded to a static island), takes a small fraction of the cycle-time of the other threads and instead sequence it with one of the threads. We can expose a compiler knob to decide what the exactly fraction should be but the upshot is that this will enable us to reduce the number of FSMs we allocate.

It also occurs to me that the static inlining pass should annotate the generated group with a @promote_static(n) attribute so that this information can be used to reschedule par threads.

The text was updated successfully, but these errors were encountered:

rachitnigam · 2023-12-27T00:04:48Z

@calyxir/static-calyx this is another example of how the static extensions help the overall compiler pipeline. We should implement this for the camera ready and brag out resource benefits we get.

rachitnigam added S: Available Can be worked upon C: calyx-opt Optimization or analysis pass labels Dec 27, 2023

rachitnigam mentioned this issue Dec 27, 2023

Feedback Directed Optimization #1834

Open

calebmkim added the C: static-cleanup Cleanup for Static Calyx Project label Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce `par` threads when latency penalty is known #1833

Reduce `par` threads when latency penalty is known #1833

rachitnigam commented Dec 27, 2023

rachitnigam commented Dec 27, 2023

Reduce par threads when latency penalty is known #1833

Reduce par threads when latency penalty is known #1833

Comments

rachitnigam commented Dec 27, 2023

rachitnigam commented Dec 27, 2023

Reduce `par` threads when latency penalty is known #1833

Reduce `par` threads when latency penalty is known #1833