-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recombine FS inputs into vectors #56
Comments
Right, I also meet this problem after 18.0 rebase. But due to want to focus on kernel, I haven't solve it from the root. Now I'm doing 18.1 rebase, so will cover these NIR changes in a better way. |
kmscube -M rgba fails in lima-18.0 branch due to this issue with "ppir: ppir: regalloc fail" |
|
In fact, even we don't recombine scalar to vector, ppir should not fail, but only generate longer code. This "regalloc fail" is indeed the ppir need to implement reg spill when out of regs. |
As far as I understand we need to reverse engineer how temporaries work first. I see store and load temporary instructions in the doc, but I don't understand where they're stored. |
I guess it's here: Each PP can have a memory stack which I guess is used to store tmp. |
There're 2 stack_address, one in lima_pp_frame_reg, another in drm_lima_m400_pp_frame/drm_lima_m450_pp_frame. I'm not sure what's the difference between them. |
lima_pp_frame_reg one is dummy, drm_lima_m400_pp_frame one is used, one for each PP. |
@yuq how do you tell which one is dummy? |
In this function: LIMA_PP_FRAME & LIMA_PP_STACK are per PP, so the lima_pp_frame_reg will be set to new value before task start. |
@yuq, what about LIMA_PP_STACK_SIZE? |
And why LIMA_PP_STACK is not used for mali450? |
LIMA_PP_STACK is used by mali450 here: Just because bcast will set same address to all PPs, so I have to set it LIMA_PP_STACK_SIZE is same for all PPs, so the lima_pp_frame_reg one |
I'll have some time to work on lima again starting by the end of this week. If nobody is currently working on this, I could pick it up and work on it. From what I understand there are two issues,
Is that correct? Just to be sure though, are we sure already that we want (1) (as in the title of this issue), even with (2) implemented? Or is it something that we would need to benchmark? |
I think you are right. 2 is needed anyway for correctness and 1 is for better performance. But 1 needs to be done also because I wrote ppir for vec4, not for scalar, some refine or recombine may be needed for correctness. |
NIR splits outputs and inputs in nir_lower_io_to_scalar_early() and unfortunately for FS it means increased number of instructions and increased register pressure.
We need to recombine inputs from scalar to vectors.
See https://www.mail-archive.com/[email protected]/msg189216.html
The text was updated successfully, but these errors were encountered: