feat: optimizations for dense models on ROCM/AMD #204

DorianZi · 2025-10-10T02:09:39Z

No description provided.

CLAassistant · 2025-10-10T02:09:46Z

All committers have signed the CLA.

3rdparty/aiter/aiter-flash_attn.patch

3rdparty/aiter/rtp-llm.patch

patched_repo.bzl

rtp_llm/utils/swizzle_utils.py

rtp_llm/model_loader/attn_weight.py

rtp_llm/cpp/rocm/hipblasMMWrapper.cc

rtp_llm/cpp/kernels/unfused_attention_kernels.h

LLLLKKKK · 2025-10-13T08:40:06Z

需要增加 smoke 测试

DorianZi · 2025-10-15T08:02:56Z

需要增加 smoke 测试
Done

已增加swizzle、fp8 attention的测试到open_merge/204
其它优化（norm, attention, rotary embeding等）已经默认打开，原有smoke可以覆盖

3rdparty/aiter/aiter.patch

rtp_llm/cpp/cache/KVCacheAllocator.cc

rtp_llm/device/device_impl.py

…on can be used.

rtp_llm/models_py/modules/rocm/fmha.py

…ator.cc

rtp_llm/cpp/cache/KVCacheAllocator.cc

rtp_llm/cpp/devices/utils/DebugUtils.cc

rtp_llm/cpp/core/DeviceTypes.h

DorianZi requested a review from LLLLKKKK October 10, 2025 02:09

DorianZi changed the title ~~optimizations for dense models on ROCM/AMD~~ feat: optimizations for dense models on ROCM/AMD Oct 10, 2025

LLLLKKKK requested changes Oct 10, 2025

View reviewed changes

3rdparty/aiter/aiter-flash_attn.patch Outdated Show resolved Hide resolved

3rdparty/aiter/rtp-llm.patch Outdated Show resolved Hide resolved

patched_repo.bzl Outdated Show resolved Hide resolved

LLLLKKKK reviewed Oct 10, 2025

View reviewed changes

rtp_llm/utils/swizzle_utils.py Outdated Show resolved Hide resolved

LLLLKKKK requested changes Oct 10, 2025

View reviewed changes

rtp_llm/model_loader/attn_weight.py Outdated Show resolved Hide resolved

amd-yilizhao force-pushed the develop/qwen3-rocm-main_more_opt branch from e4c77f2 to 688ee2d Compare October 13, 2025 05:59

LLLLKKKK requested changes Oct 13, 2025

View reviewed changes

rtp_llm/cpp/rocm/hipblasMMWrapper.cc Outdated Show resolved Hide resolved

rtp_llm/cpp/rocm/hipblasMMWrapper.cc Outdated Show resolved Hide resolved

rtp_llm/cpp/rocm/hipblasMMWrapper.cc Show resolved Hide resolved

rtp_llm/cpp/kernels/unfused_attention_kernels.h Show resolved Hide resolved

DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch 4 times, most recently from 4544c0e to eea02b6 Compare October 15, 2025 07:58

LLLLKKKK mentioned this pull request Oct 15, 2025

refacor: delete aiter whl patch and add a8w8_gemm for bert #216

Closed

LLLLKKKK requested changes Oct 15, 2025

View reviewed changes

3rdparty/aiter/aiter.patch Show resolved Hide resolved

rtp_llm/cpp/cache/KVCacheAllocator.cc Outdated Show resolved Hide resolved

rtp_llm/device/device_impl.py Outdated Show resolved Hide resolved

DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch 2 times, most recently from 90118ef to 6d3697a Compare October 16, 2025 05:46

liaocz force-pushed the develop/qwen3-rocm-main_more_opt branch 3 times, most recently from f2d3c6c to 78652a1 Compare October 16, 2025 07:52

DorianZi and others added 9 commits October 16, 2025 17:15

feat: rocm amd optimizations

45fcca5

feat: open_source rocm amd optimizations

e8fcbaf

refactor rocm fp8

c4ea45e

fix: rocm remove libstdc++ preload in .bazelrc

fc21d7b

refactor RotaryEmbedding and swizzle

5972cf0

refactor:move ck&swizzle shuffle to device_impl

b83b5af

refactor: aiter patch

c4c0ebf

refactor: aiter patch in open_source

f63cef1

refactor: through environment variables, PA-ASM or hipfy Page Attenti…

0390b98

…on can be used.

DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch from 78652a1 to 0390b98 Compare October 16, 2025 11:39

yhalpha reviewed Oct 17, 2025

View reviewed changes

rtp_llm/models_py/modules/rocm/fmha.py Show resolved Hide resolved

remove #if defined(ENABLE_FP8) || defined(USING_ROCM) in KVCacheAlloc…

7ad5892

…ator.cc

LLLLKKKK enabled auto-merge (rebase) October 20, 2025 04:41

LLLLKKKK requested changes Oct 20, 2025

View reviewed changes

rtp_llm/cpp/cache/KVCacheAllocator.cc Outdated Show resolved Hide resolved

LLLLKKKK requested changes Oct 20, 2025

View reviewed changes

rtp_llm/cpp/devices/utils/DebugUtils.cc Outdated Show resolved Hide resolved

rtp_llm/cpp/devices/utils/DebugUtils.cc Outdated Show resolved Hide resolved

fix: remove modifications made for debugging

b89e9dd

LLLLKKKK requested changes Oct 21, 2025

View reviewed changes

rtp_llm/cpp/core/DeviceTypes.h Outdated Show resolved Hide resolved

hangy-amd added 2 commits October 22, 2025 11:31

remove DeviceTypes.h in KVCacheAllocator.cc

118d702

remove DeviceTypes.h

6083468

hangy-amd force-pushed the develop/qwen3-rocm-main_more_opt branch from 2520022 to 6083468 Compare October 22, 2025 04:08

feat:enable fp8 hip_PA

3fc638f

LLLLKKKK approved these changes Oct 23, 2025

View reviewed changes

LLLLKKKK merged commit 08ad962 into main Oct 23, 2025
7 of 10 checks passed

feat: optimizations for dense models on ROCM/AMD #204

feat: optimizations for dense models on ROCM/AMD #204

Uh oh!

Conversation

DorianZi commented Oct 10, 2025

Uh oh!

CLAassistant commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LLLLKKKK commented Oct 13, 2025

Uh oh!

DorianZi commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

CLAassistant commented Oct 10, 2025 •

edited

Loading