-
Notifications
You must be signed in to change notification settings - Fork 99
feat: optimizations for dense models on ROCM/AMD #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e4c77f2
to
688ee2d
Compare
需要增加 smoke 测试 |
4544c0e
to
eea02b6
Compare
|
} | ||
|
||
#ifdef ENABLE_FP8 | ||
#if defined(ENABLE_FP8) || defined(USING_ROCM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里简单处理一下,直接用枚举,就可以不用看到 cuda 或者 rocm 的头文件了。简单一点。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没太看懂,能不能再解释下?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要用 __nv_fp8_e4m3 这样的 nv 或者 rocm 类型就可以了。去掉新加的 DeviceTypes.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
”#if defined(ENABLE_FP8) || defined(USING_ROCM)“ 这个已经去掉了。DeviceTypes.h 这个建议保留,这个文件是想把rocm和cuda的fp8, bf16等类型定义成一个类型暴露出去。而且这个定义不能放在Types.h里面,因为Type.h同时被cc文件和cu文件依赖,在ROCm这边,cc文件和cu文件使用不同的编译器,cc文件使用的gcc编译器没法找到rocm fp8相关的头文件。一个方法是可以让cc和cu都用hipcc编译器,但是这样需要改的地方很多,如果要改,建议这个分支合并后再拉专门分支改。
90118ef
to
6d3697a
Compare
f2d3c6c
to
78652a1
Compare
78652a1
to
0390b98
Compare
kv_cache_dtype = "auto" | ||
key_cache_reshaped = key_cache.permute(0, 1, 3, 2) | ||
value_cache_reshaped = value_cache.permute(0, 1, 3, 2) | ||
# key_cache_reshaped = key_cache.permute(0, 1, 3, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为啥不需要reshape了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前分支相对于之前几个月前提交到main里aiter 接口有变化,
No description provided.