diff --git a/chapters/4-Terminology-And-Metrics/4-10 Memory Latency and Bandwidth.md b/chapters/4-Terminology-And-Metrics/4-10 Memory Latency and Bandwidth.md index 49605a78d9..086698bc13 100644 --- a/chapters/4-Terminology-And-Metrics/4-10 Memory Latency and Bandwidth.md +++ b/chapters/4-Terminology-And-Metrics/4-10 Memory Latency and Bandwidth.md @@ -2,7 +2,7 @@ Inefficient memory accesses are often a dominant performance bottleneck in modern environments. Thus, how quickly a processor can fetch data from the memory subsystem is a critical factor in determining application performance. There are two aspects of memory performance: 1) how fast a CPU can fetch a single byte from memory (latency), and 2) how many bytes it can fetch per second (bandwidth). Both are important in various scenarios, we will look at a few examples later. In this section, we will focus on measuring peak performance of the memory subsystem components. -One of the tools that can become helpful on x86 platforms is Intel Memory Latency Checker (MLC),[^1] which is available for free on Windows and Linux. MLC can measure cache and memory latency and bandwidth using different access patterns and under load. On ARM-based systems there is no similar tool, however, users can download and build memory latency and bandwidth benchmarks from sources. Example of such projects are [lmbench](https://sourceforge.net/projects/lmbench/)[^2] and [Stream](https://github.com/jeffhammond/STREAM)[^3]. +One of the tools that can become helpful on x86 platforms is Intel Memory Latency Checker (MLC),[^1] which is available for free on Windows and Linux. MLC can measure cache and memory latency and bandwidth using different access patterns and under load. On ARM-based systems there is no similar tool, however, users can download and build memory latency and bandwidth benchmarks from sources. Example of such projects are [lmbench](https://sourceforge.net/projects/lmbench/)[^2], [bandwidth](https://zsmith.co/bandwidth.php)[^4] and [Stream](https://github.com/jeffhammond/STREAM)[^3]. We will only focus on a subset of metrics, namely idle read latency and read bandwidth. Let's start with the read latency. Idle means that while we do the measurements, the system is idle. This will give us the minimum time required to fetch data from memory system components, but when the system is loaded by other "memory-hungry" applications, this latency increases as there may be more queueing for resources at various points. MLC measures idle latency by doing dependent loads (aka pointer chasing). A measuring thread allocates a buffer and initializes it such that each cache line (64-byte) is pointing to another line. By appropriately sizing the buffer, we can ensure that almost all the loads are hitting in certain level of cache or memory. @@ -58,4 +58,5 @@ Knowledge of the primary characteristics of a machine is fundamental to assessin [^1]: Intel MLC tool - [https://www.intel.com/content/www/us/en/download/736633/intel-memory-latency-checker-intel-mlc.html](https://www.intel.com/content/www/us/en/download/736633/intel-memory-latency-checker-intel-mlc.html) [^2]: lmbench - [https://sourceforge.net/projects/lmbench](https://sourceforge.net/projects/lmbench) -[^3]: Stream - [https://github.com/jeffhammond/STREAM](https://github.com/jeffhammond/STREAM) \ No newline at end of file +[^3]: Stream - [https://github.com/jeffhammond/STREAM](https://github.com/jeffhammond/STREAM) +[^4]: Memory bandwidth benchmark by Zack Smith - [https://zsmith.co/bandwidth.php](https://zsmith.co/bandwidth.php) \ No newline at end of file