Skip to content

[MXFP] Fp4 type on both A and B has L0 error 0x78000011 #4136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LiyangLingIntel opened this issue May 8, 2025 · 3 comments
Open

[MXFP] Fp4 type on both A and B has L0 error 0x78000011 #4136

LiyangLingIntel opened this issue May 8, 2025 · 3 comments

Comments

@LiyangLingIntel
Copy link
Contributor

LiyangLingIntel commented May 8, 2025

Mxfp matmul with both operand A and operand B are float4 will get L0 error 0x78000011.
We should check Triton codegen if incorrect codegen leads to IGC error, otherwise we should report to IGC team.

On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_value and insert_value in llir to unpack and pack above elements, but they should be eliminated.

@LiyangLingIntel
Copy link
Contributor Author

LiyangLingIntel commented May 14, 2025

On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_values and insert_values in llir to unpack and pack above elements, but they should be eliminated.

This part is tracked by # #4062
We should try to eliminate the extract_values and insert_values before reporting to IGC.

@AndreyPavlenko
Copy link
Contributor

This #4212 PR eliminates most, but not all (due to branching). I'm not sure if this error is caused by insert/extract_values.

@LiyangLingIntel
Copy link
Contributor Author

This #4212 PR eliminates most, but not all (due to branching). I'm not sure if this error is caused by insert/extract_values.

Tried to disable llvm optimize on this case, there is no change to this error.
Since current output IR is too large to debug, we may need to resolve the insert/extract_values issue then proceed further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants