[NPU] Add group norm support on NPU by orangeH25 · Pull Request #1144 · linkedin/Liger-Kernel

orangeH25 · 2026-03-12T11:52:24Z

Summary

This PR introduces a functional GroupNorm operator for Ascend NPU.

Key improvements:

Fixes the runtime error grid should be less than 65536! and ub overflow that occurs when the original GPU-oriented liger-kernel GroupNorm implementation is executed on NPU.
Adjusts the kernel launch and tiling strategy to comply with Ascend NPU execution constraints.
Resolves numerical accuracy issues with PyTorch reference outputs.

While the current implementation is still slower than the HuggingFace implementation in end-to-end benchmarks, it provides a stable and functional GroupNorm path for Ascend NPU.

This PR mainly focuses on correctness and NPU compatibility. Further kernel-level optimizations will be explored in follow-up work.

Testing Done

Hardware Type: Atlas 800I A2
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

orangeH25 · 2026-03-16T09:12:56Z

Hi @Tcc0403 , please take a look. Thanks!

[NPU] Add group norm support on NPU

4bd92e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] Add group norm support on NPU#1144

[NPU] Add group norm support on NPU#1144
orangeH25 wants to merge 1 commit intolinkedin:mainfrom
orangeH25:group-norm/1

orangeH25 commented Mar 12, 2026

Uh oh!

orangeH25 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

orangeH25 commented Mar 12, 2026

Summary

Testing Done

Uh oh!

orangeH25 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant