compiler inserts arrive and wait for each GMMA instruction #1645

DeMoriarty · 2024-07-19T19:00:24Z

DeMoriarty
Jul 19, 2024

In a GEMM kernel that I'm modifying, I noticed that each HGMMA instruction is being waited upon immediately:

      ...
      WARPGROUP.ARRIVE 
      IADD3 R151, R152, 0x402, RZ 
      HGMMA.64x128x16.F16 R56, gdesc[UR8], R56, gsb0 
      WARPGROUP.DEPBAR.LE gsb0, 0x0
      ...
      WARPGROUP.ARRIVE 
      IADD3 R151, R152, 0x404, RZ 
      IADD3 R152, R152, 0x406, RZ 
      HGMMA.64x128x16.F16 R56, gdesc[UR8], R56, gsb0 
      WARPGROUP.DEPBAR.LE gsb0, 0x0
      ...

But in the CUDA source code, the HGMMA instructions are committed in batches:

      warpgroup_fence_operand(accum);
      warpgroup_arrive();
      CUTLASS_PRAGMA_UNROLL
      for (int k_block = 0; k_block < size<2>(tCrA); ++k_block) {
        cute::gemm(tiled_mma, tCrA(_,_,k_block,read_stage), tCrB(_,_,k_block,read_stage), accum);
        tiled_mma.accumulate_ = GMMA::ScaleOut::One;
      }
      warpgroup_commit_batch();
      warpgroup_wait<0>()

What might be causing the compiler to insert these DEPBAR.LE & ARRIVE?

thakkarV · 2024-07-19T19:33:07Z

thakkarV
Jul 19, 2024
Collaborator

when you compile this kernel, you must be getting some warnings from ptxas about serialization of the WGMMA instructions. What does it say?

13 replies

DeMoriarty Jul 24, 2024
Author

tried with 12.3.0, still the same :/

thakkarV Jul 24, 2024
Collaborator

wgmma.mma_async instructions are serialized due to non wgmma instructions defining accumulator registers of a wgmma between start and end of the pipeline stage in the function

This error suggests something in your C++ code is touching the A and/or accumulator registers of the MMA in between [operand fence before the MMA, wait group and operand fence after the mma]. (this would be a bug in your source code)

If not in C++, it is likely to be in the generated PTX (this would be an NVVM bug)

If not in PTX, it is likely to be during the compilation to SASS (this would be a ptxas bug)

Without more info, like the C++ or ptx source, I am not sure I can help more, but I encourage you to file the CUDA bugs for the appropriate modules

DeMoriarty Jul 24, 2024
Author

Thanks a lot for your help! do you think any of the compiler flags could be causing this?

--expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -forward-unknown-to-host-compiler -std=c++17 -O3 -Wunused -Xcompiler=-Wconversion -Xcompiler=-fPIC -Xcompiler=-fno-strict-aliasing --expt-relaxed-constexpr -DNDEBUG --use_fast_math --expt-extended-lambda -lineinfo -res-usage -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -gencode arch=compute_90a,code=sm_90a -DCUTLASS_DEBUG_TRACE_LEVEL=0 '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -D_GLIBCXX_USE_CXX11_ABI=0

thakkarV Jul 24, 2024
Collaborator

Try removing -fPIC

jiex-liu Feb 26, 2025

Has this issue been resolved? What are some other reason this can happen if I did not get the (C7515) warning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

compiler inserts arrive and wait for each GMMA instruction #1645

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 13 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

compiler inserts arrive and wait for each GMMA instruction #1645

Uh oh!

Uh oh!

DeMoriarty Jul 19, 2024

Replies: 1 comment · 13 replies

Uh oh!

thakkarV Jul 19, 2024 Collaborator

Uh oh!

DeMoriarty Jul 24, 2024 Author

Uh oh!

Uh oh!

thakkarV Jul 24, 2024 Collaborator

Uh oh!

DeMoriarty Jul 24, 2024 Author

Uh oh!

thakkarV Jul 24, 2024 Collaborator

Uh oh!

jiex-liu Feb 26, 2025

DeMoriarty
Jul 19, 2024

Replies: 1 comment 13 replies

thakkarV
Jul 19, 2024
Collaborator

DeMoriarty Jul 24, 2024
Author

thakkarV Jul 24, 2024
Collaborator

DeMoriarty Jul 24, 2024
Author

thakkarV Jul 24, 2024
Collaborator