You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While thinking about how to do spilling to physical registers, I realized that we could try a different strategy in the scheduler. Rather than splitting things into virtual and physical registers before the scheduler, we could combine them together. Which values get spilled to registers is decided by the scheduler, and all we need to do is guarantee that there are never more than 11 + 64 of them required (minus any physical registers we allocate ahead of time for cross-basic-block stuff). We still need to make sure that there are never more than 11 inputs at all times, but we can do that in the scheduler. I've implemented the beginning of this here: https://github.com/cwabbott0/mesa/tree/cwabbott-lima-2 but it's not done yet (haven't implemented spilling to registers yet). It seems to do better on a simple test that just does gl_Position = mat * aPos, getting within 1 instruction of the blob compiler, but it doesn't do as well on kmscube due to the lack of spilling and the increased register pressure.
The text was updated successfully, but these errors were encountered:
Thanks for sharing the compiler work. Although I can't understand the new algorithm immediately after quick scan of the code and your comments, I'll get back and try to understand it latter.
It's nice to see your work on mesa-lima compiler, I believe you're the best person for it. GP compiler is definitely the module I spend most time on but get not satisfied result until your algorithm come to me.
While thinking about how to do spilling to physical registers, I realized that we could try a different strategy in the scheduler. Rather than splitting things into virtual and physical registers before the scheduler, we could combine them together. Which values get spilled to registers is decided by the scheduler, and all we need to do is guarantee that there are never more than 11 + 64 of them required (minus any physical registers we allocate ahead of time for cross-basic-block stuff). We still need to make sure that there are never more than 11 inputs at all times, but we can do that in the scheduler. I've implemented the beginning of this here: https://github.com/cwabbott0/mesa/tree/cwabbott-lima-2 but it's not done yet (haven't implemented spilling to registers yet). It seems to do better on a simple test that just does
gl_Position = mat * aPos
, getting within 1 instruction of the blob compiler, but it doesn't do as well on kmscube due to the lack of spilling and the increased register pressure.The text was updated successfully, but these errors were encountered: