Add Deepseek V3 support for Arm #144

xiangze-arm · 2025-09-02T09:12:53Z

Add following functions for Arm Device:

moeFfnLayer
mlaContextAttention
mlaAbsorbAttention
layernormWithStride
mlaQKVGemm
slice
dispatch

Upgrade torch version from 2.1.2 to 2.6.0 for Arm backend
Add DeepSeek V2 lite support
Add DeepSeek V3 support by packing FP8 weights to INT4 and compute with KleidiAI
Improve performance of activation op with gate
Add optimized MoE path for a8w4
Merge shared expert into moeFfnLayer for DeepSeek V3
Optimize flash decoding by split q dim for mla absorb attention

This is a stacked PR based on #142 because flash attention/flash decoding are used in mla implementation. Code changes for deepseek support are in the second commit 335f3ab.

This PR
Flash attention implementation for Arm Device #142

- Implement flash attention for context attention - Implement flash decoding for decoder self attention - Avoid cache assem and use blocked kv cache directly - Compute GQA by group of heads in flash decoding Signed-off-by: Zhang Xiangze <[email protected]> Co-authored-by: Ruifeng Wang <[email protected]>

Add following functions for Arm Device: - moeFfnLayer - mlaContextAttention - mlaAbsorbAttention - layernormWithStride - mlaQKVGemm - slice - dispatch Upgrade torch version from 2.1.2 to 2.6.0 for Arm backend Add DeepSeek V2 lite support Add DeepSeek V3 support by packing FP8 weights to INT4 and compute with KleidiAI Improve performance of activation op with gate Add optimized MoE path for a8w4 Merge shared expert into moeFfnLayer for DeepSeek V3 Optimize flash decoding by split q dim for mla absorb attention Signed-off-by: Zhang Xiangze <[email protected]> Co-authored-by: Tianyu Li <[email protected]>

xiangze-arm and others added 2 commits August 19, 2025 11:08

xiangze-arm mentioned this pull request Sep 2, 2025

Add Qwen3 support for Arm #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Deepseek V3 support for Arm #144

Add Deepseek V3 support for Arm #144

Uh oh!

xiangze-arm commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Deepseek V3 support for Arm #144

Are you sure you want to change the base?

Add Deepseek V3 support for Arm #144

Uh oh!

Conversation

xiangze-arm commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant