-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inliner revamp #1813
Comments
Just as a reminder for why making inlining work more generally is a good idea, it makes all the other optimizations so much more powerful. For example, see the results from @calebmkim's paper: ![]() Fully inlining + resource sharing reduces usage by half! |
(Edited to a more tricky example) Consider the following example:
How would we inline these The problem with this scenario is that we need |
Yes, this is tricky! If we think about the kind of circuit that is generated when everything is compiled down, we'll end up muxing the inputs to the Bigger picture question: should Calyx allow groups to abstractly refer to the control signals that will eventually be used to implement the surrounding FSMs? Is there a nice and composable way of doing that? |
Thanks for sketching this up, @calebmkim. To take your discussion one step farther (envisioning what this would look like with the much-maligned "general
Is that right? Maybe this is exactly the same as your proposal, differing only in that I generated a Anyway, it's pretty weird to imagine how to "fake" this (skip the fictional "general An extremely dumb idea would be to create a 1-bit register for each invocation that holds 1 when the invocation is active… this would waste a cycle to set this register, but it would maybe be a stopgap????? |
Yes, you're right (so it was a bit misleading for me to compare it to an
Yes, we're on the same page. Your example is pretty much exactly how I would imagine inlining this example if we were to somehow suddenly support a generalized
This sounds like it should work-- but it seems like it would have to waste two cycles, since we would have to set the register to 0 after the invocation, right? Also, if we were to do this, we could use this trick to solve the problem for However, there's a separate question of whether we would even want to implement something like this, given this latency penalty. Here's one proposal: we could do some really simple check before doing any inlining: if there is only ever one set of invocation arguments for a given instance (and never any
"Faking" it sounds interesting but vague right now; I might spend some time thinking about what it would look like concretely--it may turn out that it's not possible. |
Yeah, you're absolutely right about wasting another cycle to set the special 1-bit register to 0. And it's a good idea to special-case the one-invocation case to avoid needing this… I fully support more thinking about how to "fake" the "general |
After thinking about it, the general Here is a summary of my thinking:
But it seems to me that this is basically the same thing as a generalized This leads to the question: what are the downsides of a generalized |
Synchronous Meeting SummaryWhy this problem matters
Solution to this problemThe solution we have come up with is to explore the general Next Steps
(Note that this entire problem is only for dynamic |
After looking through the codebase for a bit, I think a generalized
There are probably some other minor things that need to change, but I think it's definitely doable if we want to take a shot. |
Thanks for the analysis @calebmkim! We should also define some optimizations that shrink the scope of
Reducing the scope of |
Yep, this all makes sense-- it seems like optimizations to reduce the scope of
I'm not quite sure this would necessarily help generate simpler guards (it will probably make the program easier to analyze)---wouldn't the assignments in Another QuestionAnother question: would we want to support a One argument towards "no" is the following: we can just use a On the other hand, there is a weird asymmetry to this, and it's probably not too much work to just add a |
Maybe we need to redesign the data structures a bit so that we don't stratify |
The component inlining pass has a bunch of limitations:
invoke
-with
clausesThese come from the original implementation (#829) which wanted to reduce the amount of IR copying. Towards the end, the copying was needed anyways and so the current implementation makes the wrong trade-off.
We should revamp it to copy things as much as needed to support the two cases above. Specifically, we should make use of the
ir::Rewriter
infrastructure to compute how to rewrite the groups and the control program.@bcarlet mentioned the potential for implementing more heuristics to automate the decision of inlining more in the future as well. @andrewb1999 mentioned that one of the big missing optimizations in HLS tools is managing FSM sizes. Both of these are unlocked by reimplementing the inliner to be more general.
The text was updated successfully, but these errors were encountered: