You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the NamedBarrier implementation here, the number passed is NumMmaThreads(256) + NumThreadsPerWarp(32).
I searched for FwdNamedBarriers::ValueEmpty and found it only in the epilogue's store function. The value 32 seems related to the producer's 32 threads, but I couldn't locate any explicit use of it in the producer. Could someone clarify the rationale behind this?
The text was updated successfully, but these errors were encountered:
Ah... I actually didn't quite understand. My current understanding of the named barrier mechanism is that sync causes threads to wait, while arrive signals arrival. Only when the required number of threads have arrived will the barrier proceed. In this case, does it mean that we need 256+32 threads to arrive before the barrier can proceed? (In fact, there are only 256 threads in the consumer, so having 256+32 threads seems impossible.)
I noticed that you mentioned a sync being executed by a single warp. However, I feel that at most, there could be multiple separate arrive calls, but how could there be separate sync calls?
In the NamedBarrier implementation here, the number passed is NumMmaThreads(256) + NumThreadsPerWarp(32).
I searched for FwdNamedBarriers::ValueEmpty and found it only in the epilogue's store function. The value 32 seems related to the producer's 32 threads, but I couldn't locate any explicit use of it in the producer. Could someone clarify the rationale behind this?
The text was updated successfully, but these errors were encountered: