Skip to content

Commit

Permalink
Prettify images
Browse files Browse the repository at this point in the history
  • Loading branch information
dendibakh committed Aug 23, 2024
1 parent 8d2d561 commit f565753
Show file tree
Hide file tree
Showing 13 changed files with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ The first step to mitigate split loads/stores in this assignment is to align the
However, it's not enough to only align the starting offset of a matrix. Consider an example of a `9x9` matrix of `float` values shown in Figure @fig:MemAlignment. If a cache line is 64 bytes, it can store 16 `float` values. When using AVX2 instructions, the program will load/store 8 elements (256 bits) at a time. In each row, the first eight elements will be processed in a SIMD way, while the last element will be processed in a scalar way by the loop remainder. The second vector load/store (elements 10-17) crosses the cache line boundary as many other subsequent vector loads/stores. The problem highlighted in Figure @fig:MemAlignment affects any matrix with the number of columns that is not a multiple of 8 (for AVX2 vectorization). The SSE and ARM Neon vectorization requires 16-byte alignment; AVX-512 requires 64-byte alignment.
![Split loads/stores inside a 9x9 matrix when using AVX2 vectorization. The split memory access is highlighted in gray.](../../img/memory-access-opts/MemAlignment.png){#fig:MemAlignment width=80%}
![Split loads/stores inside a 9x9 matrix when using AVX2 vectorization. The split memory access is highlighted in yellow.](../../img/memory-access-opts/MemAlignment.png){#fig:MemAlignment width=80%}
So, in addition to aligning the starting offset, each row of the matrix should be aligned as well. For example in Figure @fig:MemAlignment, it can be achieved by inserting seven dummy columns into the matrix, effectively making it a `9x16` matrix. This will align the second row (elements 10-18) at the offset `0x40`. Similarly, we need to align all other rows. The dummy columns will not be processed by the algorithm, but they will ensure that the actual data is aligned at the cache line boundary. In our testing, the performance impact of this change was up to 30%, depending on the matrix size and the platform configuration.
Expand Down
2 changes: 1 addition & 1 deletion chapters/4-Terminology-And-Metrics/4-5 Pipeline Slot.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Pipeline Slot {#sec:PipelineSlot}

Another important metric that some performance tools use is the concept of a *pipeline slot*. A pipeline slot represents the hardware resources needed to process one $\mu$op. Figure @fig:PipelineSlot demonstrates the execution pipeline of a CPU that has 4 allocation slots every cycle. That means that the core can assign execution resources (renamed source and destination registers, execution port, ROB entries, etc.) to 4 new $\mu$ops every cycle. Such a processor is usually called a *4-wide machine*. During six consecutive cycles on the diagram, only half of the available slots were utilized. From a microarchitecture perspective, the efficiency of executing such code is only 50%.
Another important metric that some performance tools use is the concept of a *pipeline slot*. A pipeline slot represents the hardware resources needed to process one $\mu$op. Figure @fig:PipelineSlot demonstrates the execution pipeline of a CPU that has 4 allocation slots every cycle. That means that the core can assign execution resources (renamed source and destination registers, execution port, ROB entries, etc.) to 4 new $\mu$ops every cycle. Such a processor is usually called a *4-wide machine*. During six consecutive cycles on the diagram, only half of the available slots were utilized (highlighted in yellow). From a microarchitecture perspective, the efficiency of executing such code is only 50%.

![Pipeline diagram of a 4-wide CPU.](../../img/terms-and-metrics/PipelineSlot.jpg){#fig:PipelineSlot width=40% }

Expand Down
Binary file modified img/memory-access-opts/AvoidPadding.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/memory-access-opts/MemAlignment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/memory-access-opts/SWmemprefetch1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/memory-access-opts/SWmemprefetch2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/memory-access-opts/SplitLoads.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/terms-and-metrics/MemBandwidthAndLatenciesDiagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/terms-and-metrics/PipelineSlot.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/uarch/DRAM_channel_interleaving.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/uarch/DRAM_channels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/uarch/DRAM_ranks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified img/uarch/SMT.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f565753

Please sign in to comment.