Skip to content

Conversation

@milesvant
Copy link
Contributor

@milesvant milesvant commented Dec 3, 2025

The MMA warp category's TMEM address write to shared memory is made visible to the epilogue/correction warp categories implicitly due to intermediate barriers between the warps which synchronize the MMA output. But in the 0 KV tile case, there is no such barrier with the epilogue so the write might not be visible. In that case the epilogue warp can read undefined data from shared memory for the TMEM address and the tcgen05::dealloc may fail.

This patch attempts to fix this issue by only allocating the TMEM when there is at least 1 KV tile by a persistent CTA. If there is at least one KV tile, the TMEM address should be correctly synchronized, even if the final work tile processed has 0 KV.

Another potential solution is to perform the dealloc in the MMA warp category and add extra synchronization from the TMEM consumers signaling that their usage is complete.

P.S.
PTX ISA Manual states that

When .cta_group::1 is specified, one warp from the CTA must perform the allocation and de-allocation.

which is ambiguously phrased in my opinion, as this could either mean that the same warp must perform the alloc and dealloc (which is not the case here), or different single warps can perform alloc and dealloc for a given TMEM allocation.

continue;
}

if (!allocated) {
Copy link
Contributor Author

@milesvant milesvant Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorders this TMEM allocation with the next continue statement

@manishucsd
Copy link
Contributor

@hwu36 , @IonThruster , @v0i0 , @richardmcai , Should compute-sanitizer be able to catch such issues? Does CUTLASS CI runs compute-sanitizer to rule out such issues?

@v0i0
Copy link
Contributor

v0i0 commented Dec 5, 2025

Iirc synccheck can find this (due to the smem read / write) thats how I recently found something like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants