Shared memory bank conflict problem #281
Unanswered
FindDefinition
asked this question in
Q&A
Replies: 2 comments 6 replies
-
|
Hmm. Does cutlass have this problem or your customized kernel have this problem? You can use inline ptx to enforce 128bit shared memory. Just implement |
Beta Was this translation helpful? Give feedback.
4 replies
-
|
Any update? Has the issue been fixed? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm debugging a turing gemm kernel:
TileShape: 128x256x32
WarpTileShape: 64x64x32
layout: ttt
dtypes: all
cutlass::half_ttensor op: 1688
CUDA: 11.3
When I profile this kernel in nsight compute, lots of bank conflicts detected:

After debug, two issues found:
st.shared.u32instead ofst.shared.v4.u32in ptxHere is the problem:
How can I force the compiler generate
st.shared.v4.u32? use inline asm? does thest.shared.v4.u32require registers address to be aligned?gemm code:
Beta Was this translation helpful? Give feedback.
All reactions