Question about uncacheable access to classic caches #1687

arkhadem · 2024-10-14T21:59:26Z

arkhadem
Oct 14, 2024

Hi Everyone,

The current cache implementation looks up the cache for any uncacheable access and considers lookupLatency for it. I am accessing an MMIO accelerator with uncacheable physical addresses, so there will be no cacheable access to this memory region in any part of the simulation. Why should I still pay so much latency when accessing uncacheable MMIO registers through each cache level?

Best,

Alireza

ivanaamit · 2024-10-18T16:56:22Z

ivanaamit
Oct 18, 2024

Hi @arkhadem, please let me know if this helps.

In gem5, even when accessing uncacheable memory like MMIO registers, the default behavior is to still pass these accesses through the cache hierarchy, where the lookup latency is incurred. This is because gem5’s default cache system assumes that all memory accesses, including uncacheable ones, still need to be handled by the cache. This allows gem5 to simulate memory hierarchies comprehensively, but it can lead to unnecessary overheads in cases like yours where the memory is known to be uncacheable.

Why does this happen?

Cache Hierarchy Uniformity: By default, gem5 assumes that every access, even uncacheable ones, will go through the cache hierarchy for consistency. This means that it will always check each cache level (L1, L2, etc.) for each access, regardless of whether the memory is cacheable.
Latency Handling: For uncacheable memory accesses, the cache controller knows that the data isn’t cached, but still performs the lookup. The lookupLatency you mentioned reflects the time it takes for the cache to recognize that the address is uncacheable and pass the request along the memory hierarchy or directly to the backing store (in your case, the MMIO registers). This results in unnecessary delay when accessing uncacheable memory.

What can you do to optimize this?

To avoid the unnecessary cache lookups and the associated latency for uncacheable MMIO accesses, you can bypass the cache for uncacheable regions. Here are a few ways to address this:

Bypass Cache for Uncacheable Regions: You can configure the memory system such that accesses to uncacheable regions bypass the cache hierarchy altogether.
- In gem5, the requestorId or the access flags can be checked at each cache level, and if the access is marked as uncacheable, it can be routed directly to the memory controller or MMIO without going through each cache level.
- You may need to modify the Cache object code to check for uncacheable accesses and directly forward them to the next level of the memory hierarchy without incurring lookup latencies.
Configure Cache::isUncacheable(): In the cache implementation, there is typically a function like isUncacheable(), which checks whether a request is cacheable. You could modify the behavior such that if the access is known to be uncacheable (for instance, if it is MMIO), it skips the cache entirely and goes straight to the next level or the device.
MMIO Device Integration: Ensure that your MMIO device is properly integrated with the memory system in a way that allows direct access without needing to pass through cache layers. This may involve refining how the memory ranges for the MMIO region are mapped within gem5, ensuring that accesses to this range are properly marked as uncacheable and routed directly to the MMIO device.
Direct Memory Access (DMA) Model: If your accelerator uses DMA for data transfer, you can configure DMA engines to bypass the cache hierarchy and directly communicate with the memory controller. This would allow direct transfers between the MMIO device and memory, avoiding cache overhead.

Additional Considerations

Memory System Topology: Ensure that your memory system topology is configured to distinguish between normal and uncacheable accesses clearly. This may require tweaking the memory controllers and interconnects.
gem5 Params: Some gem5 parameter files might already have options for bypassing certain cache layers based on memory type or request type. Look into parameters like cacheable or uncacheable when defining your system configuration.

Conclusion

In summary, uncacheable accesses like MMIO should ideally bypass the cache to avoid unnecessary lookup latencies. You can achieve this by modifying the cache code to route uncacheable accesses directly to the memory controller or by configuring your system's memory map to ensure that MMIO accesses don't go through the cache hierarchy at all. This will lead to more efficient simulation of your MMIO accelerator and avoid the penalty of cache lookup latencies for accesses that don’t need it.

0 replies

arkhadem · 2024-10-18T23:28:15Z

arkhadem
Oct 18, 2024
Author

Hi @ivanaamit,

Thanks for your response. As you said, the current implementation assumes all the caches are responsible for all memory address space.

This is because gem5’s default cache system assumes that all memory accesses, including uncacheable ones, still need to be handled by the cache. This allows gem5 to simulate memory hierarchies comprehensively.

I cannot imagine any situation where uncacheable access (added to the page table as an uncacheable address range) would be cached. In other words, the following line would always return nothing if the address range is uncacheable:

tags->findBlock(tags->findBlock(pkt->getAddr(), pkt->isSecure()))

The data may be in the MSHR or witebuffer, but not in the tag array. So there's no need in any situation to pay the tag latency. Please correct me if I'm wrong. I am more specifically talking about Cache::access function, which is checked upon recvTimingReq.

Best,

Alireza

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gem5

Question about uncacheable access to classic caches #1687

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

gem5

Question about uncacheable access to classic caches #1687

arkhadem Oct 14, 2024

Replies: 2 comments

ivanaamit Oct 18, 2024

Why does this happen?

What can you do to optimize this?

Additional Considerations

Conclusion

arkhadem Oct 18, 2024 Author

arkhadem
Oct 14, 2024

ivanaamit
Oct 18, 2024

arkhadem
Oct 18, 2024
Author