Flash attention implementation for Arm Device #142

xiangze-arm · 2025-08-19T05:46:45Z

Implement flash attention for context attention
Implement flash decoding for decoder self attention
Avoid cache assem and use blocked kv cache directly
Compute GQA by group of heads in flash decoding

- Implement flash attention for context attention - Implement flash decoding for decoder self attention - Avoid cache assem and use blocked kv cache directly - Compute GQA by group of heads in flash decoding Signed-off-by: Zhang Xiangze <[email protected]> Co-authored-by: Ruifeng Wang <[email protected]>

CLAassistant · 2025-08-20T10:33:09Z

All committers have signed the CLA.

This was referenced Sep 2, 2025

Add Deepseek V3 support for Arm #144

Open

Add Qwen3 support for Arm #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flash attention implementation for Arm Device #142

Flash attention implementation for Arm Device #142

Uh oh!

xiangze-arm commented Aug 19, 2025

Uh oh!

CLAassistant commented Aug 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Flash attention implementation for Arm Device #142

Are you sure you want to change the base?

Flash attention implementation for Arm Device #142

Uh oh!

Conversation

xiangze-arm commented Aug 19, 2025

Uh oh!

CLAassistant commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Aug 20, 2025 •

edited

Loading