Skip to content

[NPU] Add group norm support on NPU#1144

Open
orangeH25 wants to merge 1 commit intolinkedin:mainfrom
orangeH25:group-norm/1
Open

[NPU] Add group norm support on NPU#1144
orangeH25 wants to merge 1 commit intolinkedin:mainfrom
orangeH25:group-norm/1

Conversation

@orangeH25
Copy link
Contributor

Summary

This PR introduces a functional GroupNorm operator for Ascend NPU.

Key improvements:

  • Fixes the runtime error grid should be less than 65536! and ub overflow that occurs when the original GPU-oriented liger-kernel GroupNorm implementation is executed on NPU.
  • Adjusts the kernel launch and tiling strategy to comply with Ascend NPU execution constraints.
  • Resolves numerical accuracy issues with PyTorch reference outputs.

While the current implementation is still slower than the HuggingFace implementation in end-to-end benchmarks, it provides a stable and functional GroupNorm path for Ascend NPU.

This PR mainly focuses on correctness and NPU compatibility. Further kernel-level optimizations will be explored in follow-up work.

Testing Done

image
  • Hardware Type: Atlas 800I A2
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@orangeH25
Copy link
Contributor Author

Hi @Tcc0403 , please take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant