[MXFP] Fp4 type on both A and B has L0 error `0x78000011` #4136

LiyangLingIntel · 2025-05-08T08:27:06Z

Mxfp matmul with both operand A and operand B are float4 will get L0 error 0x78000011.
We should check Triton codegen if incorrect codegen leads to IGC error, otherwise we should report to IGC team.

On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_value and insert_value in llir to unpack and pack above elements, but they should be eliminated.

The text was updated successfully, but these errors were encountered:

LiyangLingIntel · 2025-05-14T01:33:32Z

On the other hand, tests for mxfp4 are very slow.
mxfp format require each 32 contiguous elements in operand tensor match 1 element in scale tensor, for mxfp4, each 2 fp4 elements are packed into 1 uint8 element. Based on these, it needs a seris of bitcast + layout conversion to make it work.
We can see tons of extract_values and insert_values in llir to unpack and pack above elements, but they should be eliminated.

This part is tracked by # #4062
We should try to eliminate the extract_values and insert_values before reporting to IGC.

AndreyPavlenko · 2025-05-15T02:05:09Z

This #4212 PR eliminates most, but not all (due to branching). I'm not sure if this error is caused by insert/extract_values.

LiyangLingIntel · 2025-05-19T08:24:08Z

This #4212 PR eliminates most, but not all (due to branching). I'm not sure if this error is caused by insert/extract_values.

Tried to disable llvm optimize on this case, there is no change to this error.
Since current output IR is too large to debug, we may need to resolve the insert/extract_values issue then proceed further investigation.

LiyangLingIntel mentioned this issue May 8, 2025

[TEST][MXFP] Skip not natively supported mxfp matmul tests #4117

Merged

vlad-penkin added dependencies: level_zero codegen: gemm labels May 11, 2025

vlad-penkin added this to the 3. [Triton] Language and Runtime milestone May 11, 2025

vlad-penkin assigned LiyangLingIntel May 12, 2025

AndreyPavlenko mentioned this issue May 15, 2025

Do not use extractvalue if the inserted value is directly reachable #4212

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MXFP] Fp4 type on both A and B has L0 error `0x78000011` #4136

[MXFP] Fp4 type on both A and B has L0 error `0x78000011` #4136

LiyangLingIntel commented May 8, 2025 •

edited

Loading

LiyangLingIntel commented May 14, 2025 •

edited

Loading

Uh oh!

AndreyPavlenko commented May 15, 2025

Uh oh!

LiyangLingIntel commented May 19, 2025

Uh oh!

[MXFP] Fp4 type on both A and B has L0 error 0x78000011 #4136

[MXFP] Fp4 type on both A and B has L0 error 0x78000011 #4136

Comments

LiyangLingIntel commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LiyangLingIntel commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreyPavlenko commented May 15, 2025

Uh oh!

LiyangLingIntel commented May 19, 2025

Uh oh!

[MXFP] Fp4 type on both A and B has L0 error `0x78000011` #4136

[MXFP] Fp4 type on both A and B has L0 error `0x78000011` #4136

LiyangLingIntel commented May 8, 2025 •

edited

Loading

LiyangLingIntel commented May 14, 2025 •

edited

Loading