Skip to content

Conversation

DorianZi
Copy link
Collaborator

No description provided.

@DorianZi DorianZi requested a review from LLLLKKKK October 10, 2025 02:09
@CLAassistant
Copy link

CLAassistant commented Oct 10, 2025

CLA assistant check
All committers have signed the CLA.

@DorianZi DorianZi changed the title optimizations for dense models on ROCM/AMD feat: optimizations for dense models on ROCM/AMD Oct 10, 2025
@amd-yilizhao amd-yilizhao force-pushed the develop/qwen3-rocm-main_more_opt branch from e4c77f2 to 688ee2d Compare October 13, 2025 05:59
@LLLLKKKK
Copy link
Collaborator

需要增加 smoke 测试

@DorianZi DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch 4 times, most recently from 4544c0e to eea02b6 Compare October 15, 2025 07:58
@DorianZi
Copy link
Collaborator Author

需要增加 smoke 测试
Done

  1. 已增加swizzle、fp8 attention的测试到open_merge/204
  2. 其它优化(norm, attention, rotary embeding等)已经默认打开,原有smoke可以覆盖

}

#ifdef ENABLE_FP8
#if defined(ENABLE_FP8) || defined(USING_ROCM)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里简单处理一下,直接用枚举,就可以不用看到 cuda 或者 rocm 的头文件了。简单一点。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没太看懂,能不能再解释下?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用 __nv_fp8_e4m3 这样的 nv 或者 rocm 类型就可以了。去掉新加的 DeviceTypes.h

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

”#if defined(ENABLE_FP8) || defined(USING_ROCM)“ 这个已经去掉了。DeviceTypes.h 这个建议保留,这个文件是想把rocm和cuda的fp8, bf16等类型定义成一个类型暴露出去。而且这个定义不能放在Types.h里面,因为Type.h同时被cc文件和cu文件依赖,在ROCm这边,cc文件和cu文件使用不同的编译器,cc文件使用的gcc编译器没法找到rocm fp8相关的头文件。一个方法是可以让cc和cu都用hipcc编译器,但是这样需要改的地方很多,如果要改,建议这个分支合并后再拉专门分支改。

@DorianZi DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch 2 times, most recently from 90118ef to 6d3697a Compare October 16, 2025 05:46
@liaocz liaocz force-pushed the develop/qwen3-rocm-main_more_opt branch 3 times, most recently from f2d3c6c to 78652a1 Compare October 16, 2025 07:52
@DorianZi DorianZi force-pushed the develop/qwen3-rocm-main_more_opt branch from 78652a1 to 0390b98 Compare October 16, 2025 11:39
kv_cache_dtype = "auto"
key_cache_reshaped = key_cache.permute(0, 1, 3, 2)
value_cache_reshaped = value_cache.permute(0, 1, 3, 2)
# key_cache_reshaped = key_cache.permute(0, 1, 3, 2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥不需要reshape了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前分支相对于之前几个月前提交到main里aiter 接口有变化,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants