[OneDNN] Zufang/onednn w4a16 int4 #49

zufangzhu · 2025-10-10T05:05:22Z

add onednn w4a16 gemm and ut
please see vllm change in https://github.com/intel-sandbox/vllm-xpu/pull/362/files

Signed-off-by: Zhu, Zufang <[email protected]>

Copilot

Pull Request Overview

This PR implements OneDNN backend support for W4A16 int4 and FP8 quantized matrix multiplication operations. It migrates and consolidates quantization functionality from IPEX while maintaining a unified interface.

Key changes:

Add int4 W4A16 GEMM operation with support for symmetric/asymmetric quantization and group quantization
Add FP8 W8A16 GEMM operation supporting multiple FP8 formats (e4m3fn, e5m2)
Refactor OneDNN type mappers and bias handling utilities for better code reuse

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_int4_gemm_onednn.py	Test suite for int4 W4A16 GEMM with various quantization modes and activation ordering
tests/test_fp8_gemm_onednn.py	Test suite for FP8 W8A16 GEMM with different data types and tensor layouts
tests/register_ops.py	Python bindings for new int4 and FP8 GEMM operations
csrc/xpu/torch_bindings.cpp	C++ torch binding registration for int4 GEMM operation
csrc/xpu/ops.h	Function declaration for int4 GEMM operation
csrc/xpu/onednn/onednn_matmul.cpp	Main implementation of FP8 and int4 GEMM operations with tensor validation
csrc/xpu/onednn/onednn_ext.h	Refactored type mappers and bias utilities to support 3-tuple returns and consolidated bias handling
csrc/xpu/onednn/int4_gemm_w4a16.h	OneDNN-specific implementation for int4 W4A16 matrix multiplication
csrc/xpu/onednn/fp8_gemm_w8a16.h	Simplified FP8 W8A16 implementation using refactored bias utilities
csrc/xpu/onednn/fp8_gemm_w8a16.cpp	Removed standalone FP8 implementation (consolidated into onednn_matmul.cpp)

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/test_fp8_gemm_onednn.py

csrc/xpu/onednn/int4_gemm_w4a16.h

baodii

LGTM

jikunshang · 2025-10-13T09:02:37Z

vllm_xpu_kernels/quantization/_quantize_convert.py

+import torch.nn as nn
+
+
+class GPTQShuffle(nn.Module):


we will use this in vllm side? I feel this should be renamed to GPTQUtils, and not inherit nn.module.

Signed-off-by: Zhu, Zufang <[email protected]>

zufangzhu added 2 commits September 25, 2025 20:58

add w4a16_int4

f8e6ad7

Signed-off-by: Zhu, Zufang <[email protected]>

fix rebase error

77bcb98

Signed-off-by: Zhu, Zufang <[email protected]>

zufangzhu requested review from baodii, Copilot, jikunshang and rogerxfeng8 October 10, 2025 05:05

zufangzhu added the WIP label Oct 10, 2025

Copilot AI reviewed Oct 10, 2025

View reviewed changes

tests/test_fp8_gemm_onednn.py Outdated Show resolved Hide resolved

csrc/xpu/onednn/int4_gemm_w4a16.h Outdated Show resolved Hide resolved

zufangzhu force-pushed the zufang/onednn_w4a16_int4 branch from ffc81ab to 64cc20f Compare October 13, 2025 05:16

zufangzhu added onednn and removed WIP labels Oct 13, 2025

baodii approved these changes Oct 13, 2025

View reviewed changes

jikunshang reviewed Oct 13, 2025

View reviewed changes

add ut for gemm kernel

d2cf1fa

Signed-off-by: Zhu, Zufang <[email protected]>

zufangzhu force-pushed the zufang/onednn_w4a16_int4 branch from af406ba to d2cf1fa Compare October 16, 2025 01:37

remove error comment

2194b7c

Signed-off-by: Zhu, Zufang <[email protected]>

rogerxfeng8 approved these changes Oct 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OneDNN] Zufang/onednn w4a16 int4 #49

[OneDNN] Zufang/onednn w4a16 int4 #49

Uh oh!

zufangzhu commented Oct 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

baodii left a comment

Uh oh!

jikunshang Oct 13, 2025

Uh oh!

zufangzhu Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		import torch.nn as nn


		class GPTQShuffle(nn.Module):

[OneDNN] Zufang/onednn w4a16 int4 #49

Are you sure you want to change the base?

[OneDNN] Zufang/onednn w4a16 int4 #49

Uh oh!

Conversation

zufangzhu commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

baodii left a comment

Choose a reason for hiding this comment

Uh oh!

jikunshang Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

zufangzhu Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zufangzhu commented Oct 10, 2025 •

edited

Loading