You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mxfp matmul with both operand A and operand B are float4 will get L0 error 0x78000011.
We should check Triton codegen if incorrect codegen leads to IGC error, otherwise we should report to IGC team.
On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_value and insert_value in llir to unpack and pack above elements, but they should be eliminated.
The text was updated successfully, but these errors were encountered:
On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_values and insert_values in llir to unpack and pack above elements, but they should be eliminated.
This part is tracked by # #4062
We should try to eliminate the extract_values and insert_values before reporting to IGC.
This #4212 PR eliminates most, but not all (due to branching). I'm not sure if this error is caused by insert/extract_values.
Tried to disable llvm optimize on this case, there is no change to this error.
Since current output IR is too large to debug, we may need to resolve the insert/extract_values issue then proceed further investigation.
Uh oh!
There was an error while loading. Please reload this page.
Mxfp matmul with both operand A and operand B are
float4
will get L0 error0x78000011
.We should check Triton codegen if incorrect codegen leads to IGC error, otherwise we should report to IGC team.
On the other hand, tests for
mxfp4
are very slow.mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of
extract_value
andinsert_value
in llir to unpack and pack above elements, but they should be eliminated.The text was updated successfully, but these errors were encountered: