Skip to content

Conversation

xiangze-arm
Copy link

Add qkRmsNorm for Arm to support Qwen3 models.

This is a stacked PR. Code changes for adding qkRmsNorm are in the third commit 480ec99.

xiangze-arm and others added 3 commits August 19, 2025 11:08
- Implement flash attention for context attention
- Implement flash decoding for decoder self attention
- Avoid cache assem and use blocked kv cache directly
- Compute GQA by group of heads in flash decoding

Signed-off-by: Zhang Xiangze <[email protected]>
Co-authored-by: Ruifeng Wang <[email protected]>
Add following functions for Arm Device:
  - moeFfnLayer
  - mlaContextAttention
  - mlaAbsorbAttention
  - layernormWithStride
  - mlaQKVGemm
  - slice
  - dispatch
Upgrade torch version from 2.1.2 to 2.6.0 for Arm backend
Add DeepSeek V2 lite support
Add DeepSeek V3 support by packing FP8 weights to INT4 and compute with KleidiAI
Improve performance of activation op with gate
Add optimized MoE path for a8w4
Merge shared expert into moeFfnLayer for DeepSeek V3
Optimize flash decoding by split q dim for mla absorb attention

Signed-off-by: Zhang Xiangze <[email protected]>
Co-authored-by: Tianyu Li <[email protected]>
@netaddi
Copy link
Collaborator

netaddi commented Sep 24, 2025

Hi @xiangze-arm, we have updated our development workflow, and we integrated our complete ci test pipeline into github pull request actions. could you please submit the pull requests again (maybe all-in-one), so that the pr auto triggers our ci
pipeline ?
thanks for you contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants