-
Notifications
You must be signed in to change notification settings - Fork 62
Allow noinline
functions to be called with correct argument types.
#3963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tiotto, Ettore <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we resolve the problem in rewrite_stack_ptr pass introduced in #3497 instead of common pass?
noinline
functions to be called with correct argument types.
Signed-off-by: Tiotto, Ettore <[email protected]>
I tried the new pass and found that it doen't help in this case. Consider this simple test case:
Here, if the Essentially, when the last parameter of the kernel is not a |
Signed-off-by: Tiotto, Ettore <[email protected]>
I have made a copy of |
%0 = tt.load %arg0 : !tt.ptr<f32> | ||
%1 = tt.load %arg1 : !tt.ptr<f32> | ||
// CHECK: llvm.call spir_funccc @noinline_shared_fn__fp32_fp32_Pfp32__(%8, %17, %arg2, %arg3, %arg2) | ||
tt.call @noinline_shared_fn__fp32_fp32_Pfp32__(%0, %1, %arg2) {allocation.offset = 0 : i32} : (f32, f32, !tt.ptr<f32>) -> () | ||
// CHECK: llvm.call spir_funccc @noinline_shared_fn(%8, %17, %arg2, %arg3, %arg2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this our backend issue for converting the calling conversion? Why do we duplicate the %arg2 as the last param to the callee? It seems the callee doesn't require any global scratch space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reuse CallOpConversion
from upstream one after the pass. promoteOperands
will append GlobalScratchPtr
.
intel-xpu-backend-for-triton/lib/Conversion/TritonGPUToLLVM/ControlFlowOpToLLVM.cpp
Lines 103 to 104 in fa4cfa0
promotedOperands.push_back(LLVM::getGlobalScratchPtr( | |
loc, rewriter, targetInfo, caller, opOffsetVal)); |
%arg2
added as lastArg.
intel-xpu-backend-for-triton/include/triton/Conversion/TritonGPUToLLVM/Utility.h
Lines 518 to 520 in fa4cfa0
auto gmemBase = funcOp.getArgument(funcOp.getNumArguments() - 1); | |
if (!allocOffset) { | |
return gmemBase; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowering pattern should transform the kernel entry function's signature as well. Rigth?
The global scratch space base address should be appended to the kernel entry's function.
The triton::CallOp
lowering pattern should use the global scratch bases instead of reusing the user buffer which is passed as last argument, %arg2
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this our backend issue for converting the calling conversion? Why do we duplicate the %arg2 as the last param to the callee? It seems the callee doesn't require any global scratch space.
No it is not related to the calling convention. We fix the calling convention in another transformation pattern (FixCallCConv).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lowering pattern should transform the kernel entry function's signature as well. Rigth? The global scratch space base address should be appended to the kernel entry's function. The
triton::CallOp
lowering pattern should use the global scratch bases instead of reusing the user buffer which is passed as last argument,%arg2
in this case.
I do not know what transformation patter is supposed to change the kernel signature. The host would have to pass a pointer to the global scratch place as well. @ESI-SYD do you know what part of the code is supposed to append the pointer to the scratch space to the kernel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am putting this PR back into draft mode because the Triton FE/driver doesn't yet pass to the kernel a pointer to the global scratch space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the global scratch space is not supported by Intel GPU yet. The #3612
This PR is currently blocked by:
#3974
#3612