You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2095
optimization on embedding forward for MI350:
1. apply vec4 on embedding vbe forward kernel instead of vec2
2. As there are 64 threads in rocm, optimize subwarp in embedding forward v2 kernel when embedding dim is from 32 to 64.
Pull Request resolved: #5064
Reviewed By: q10
Differential Revision: D85701691
Pulled By: spcyppt
fbshipit-source-id: 72f491414f50e53038a4b02f3d555967d34740a7
0 commit comments