Replies: 2 comments
-
Hi @arkhadem, please let me know if this helps. In gem5, even when accessing uncacheable memory like MMIO registers, the default behavior is to still pass these accesses through the cache hierarchy, where the lookup latency is incurred. This is because gem5’s default cache system assumes that all memory accesses, including uncacheable ones, still need to be handled by the cache. This allows gem5 to simulate memory hierarchies comprehensively, but it can lead to unnecessary overheads in cases like yours where the memory is known to be uncacheable. Why does this happen?
What can you do to optimize this?To avoid the unnecessary cache lookups and the associated latency for uncacheable MMIO accesses, you can bypass the cache for uncacheable regions. Here are a few ways to address this:
Additional Considerations
ConclusionIn summary, uncacheable accesses like MMIO should ideally bypass the cache to avoid unnecessary lookup latencies. You can achieve this by modifying the cache code to route uncacheable accesses directly to the memory controller or by configuring your system's memory map to ensure that MMIO accesses don't go through the cache hierarchy at all. This will lead to more efficient simulation of your MMIO accelerator and avoid the penalty of cache lookup latencies for accesses that don’t need it. |
Beta Was this translation helpful? Give feedback.
-
Hi @ivanaamit, Thanks for your response. As you said, the current implementation assumes all the caches are responsible for all memory address space.
I cannot imagine any situation where uncacheable access (added to the page table as an uncacheable address range) would be cached. In other words, the following line would always return nothing if the address range is uncacheable:
The data may be in the MSHR or witebuffer, but not in the tag array. So there's no need in any situation to pay the tag latency. Please correct me if I'm wrong. I am more specifically talking about Best,
|
Beta Was this translation helpful? Give feedback.
-
Hi Everyone,
The current cache implementation looks up the cache for any uncacheable access and considers lookupLatency for it. I am accessing an MMIO accelerator with uncacheable physical addresses, so there will be no cacheable access to this memory region in any part of the simulation. Why should I still pay so much latency when accessing uncacheable MMIO registers through each cache level?
Best,
Beta Was this translation helpful? Give feedback.
All reactions