[Grammar] 12-7 Case Study - LLC sensitivity.md

dendibakh · Aug 10, 2024 · 68dc88f · 68dc88f
1 parent 0427b22
commit 68dc88f
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/chapters/12-Other-Tuning-Areas/12-7 Case Study - LLC sensitivity.md b/chapters/12-Other-Tuning-Areas/12-7 Case Study - LLC sensitivity.md
@@ -86,7 +86,7 @@ Similarly, the memory read bandwidth allocated to a thread can be limited. This
 
 ### Metrics {.unlisted .unnumbered}
 
-The ultimate metric for quantifying the performance of an application is execution time. To analyze the impact of the memory hierarchy on system performance, we will also use the following three metrics: 1) CPI, cycles per instruction[^6], 2) DMPKI, demand misses in the LLC per thousand instructions, and 3) MPKI, total misses (demand + prefetch) in the LLC per thousand instructions. While CPI has a direct correlation with performance of an application, DMPKI and MPKI do not necessarily impact performance. Table @tbl:metrics shows the formulas used to calculate each metric from specific hardware counters. Detailed description for each of the counters is available in AMD's Processor Programming Reference [@amd_ppr].
+The ultimate metric for quantifying the performance of an application is execution time. To analyze the impact of the memory hierarchy on system performance, we will also use the following three metrics: 1) CPI, cycles per instruction[^6], 2) DMPKI, demand misses in the LLC per thousand instructions, and 3) MPKI, total misses (demand + prefetch) in the LLC per thousand instructions. While CPI has a direct correlation with the performance of an application, DMPKI and MPKI do not necessarily impact performance. Table @tbl:metrics shows the formulas used to calculate each metric from specific hardware counters. Detailed description for each of the counters is available in AMD's Processor Programming Reference [@amd_ppr].
 
 ------   ----------------------------------------------------------------------------
 Metric                                     Formula                   
@@ -128,7 +128,7 @@ In contrast, `503.bwaves` and `554.roms` don't make use of all available LLC spa
 
 Let's now analyze the MPKI graph, that combines both LLC demand misses and prefetch requests. First of all, we can see that the MPKI values are always much higher than the DMPKI values. That is, most of the blocks are loaded from memory into the on-chip hierarchy by the prefetcher. This behavior is due to the fact that the prefetcher is efficient in preloading the private caches with the data to be used, thus eliminating most of the demand misses.
 
-For `503.bwaves`, we observe that MPKI remains roughly at the same level, similar to CPI and DMPKI charts. There is likely not much data reuse in the benchmark and/or the memory traffic is very low. The `520.omnetpp` workload behaves as we identified earlier: MPKI decreases as the available space increases. However, for `554.roms`, the MPKI chart shows a large drop in total misses as the available space increases while CPI and DMPKI remained unchanged. In this case, there is data reuse in the benchmark, but it is not consequential to the performance. The prefetcher can bring required data ahead of time, eliminating demand misses, regardless of the available space in the LLC. However, as the available space decreases, the probability that the prefetcher will not find the blocks in the LLC and will have to load them from memory increases. So, giving more LLC capacity to this type of application does not directly benefit their performance, but it does benefit the system since it reduces memory traffic.
+For `503.bwaves`, we observe that MPKI remains roughly at the same level, similar to CPI and DMPKI charts. There is likely not much data reuse in the benchmark and/or the memory traffic is very low. The `520.omnetpp` workload behaves as we identified earlier: MPKI decreases as the available space increases. However, for `554.roms`, the MPKI chart shows a large drop in total misses as the available space increases while CPI and DMPKI remain unchanged. In this case, there is data reuse in the benchmark, but it is not consequential to the performance. The prefetcher can bring required data ahead of time, eliminating demand misses, regardless of the available space in the LLC. However, as the available space decreases, the probability that the prefetcher will not find the blocks in the LLC and will have to load them from memory increases. So, giving more LLC capacity to this type of application does not directly benefit their performance, but it does benefit the system since it reduces memory traffic.
 
 By looking at the CPI and DMPKI, we initially thought that `554.roms` is insensitive to the LLC size. But by analyzing the MPKI chart, we need to reconsider our statement and conclude that `554.roms` is sensitive to LLC size as well, and, therefore, it seems a good idea not to limit its available LLC space so as not to increase memory bandwidth consumption. Higher bandwidth consumption may increase memory access latency, which in turn implies a performance degradation of other applications running on the system [@Balancer2023].