-
Notifications
You must be signed in to change notification settings - Fork 40
Description
in the current Replay Table (RTAB) design, whenever refill_update_rtab becomes high, the dir_unavailable dependency is unset for all the RTAB entries. And these entries, if waiting for the dir_unavail dependency will be replayed in a round robin way. This behavior is incorrect as a refill should only replay one request, i.e., release dir_unavail dependency for one replay entry. If all entries are replayed and there aren't enough free ways available in the directory set (because refill comes late), all replays except the first one will be rolled back, and waste cycles by blocking core requests (that may or may not target a different set!)
A solution i have experimented is to use a grant mask for dir_unavailable dependency vector, and update this mask everytime refill comes back, i.e., use refill_i as the arbiter ready signal. This improves performance by a significant amount when many read misses target the same cache set and many of them go to the RTAB waiting for refills.
However, the true bottlebeck here is the pre-allocation of cache way upon read misses. The cache needs to wait for the refill to come back before it can put more misses into the MSHR on a specific set. Because there are only limited #ways for a set, e.g., 4, the misses can be processed in a batch of 4. When more than #way of requests come in, the additional misses will need to wait for the refill to come back so that they can be replayed. This severely limits the throughput when many requests target the same set, i.e., threads within a warp for GPU