-
Notifications
You must be signed in to change notification settings - Fork 63
Links
Miscellaneous links that I'll probably want to read again in the future.
Cache behavior for all recent Intel at uops.info.
DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks Description of physical -> DRAM mapping RE techniques, and results for many systems including Skylake DDR4. Uses a timing approach to find pairs of addresses that have a bank/row conflict. Associated github repo with the RE tool.
Ulrich Drepper's What Every Programmer Should Know About Memory
Sandy Bridge physical address to DRAM mapping well described. Note that Ivy Bridge is apparently more complex in the channel mapping.
Method for reverse engineering the physical-DRAM mapping describes how to determine the DRAM bank mappings by searching for bank collisions via timing.
Physical Address Decoding in Intel Xeon v3/v4 CPUs: A Supplemental Datasheet describes the physical to DRAM mapping down to rank granularity (socket, channel, rank, but not finder). They use a combination of a linear DRAM mapping and the normal interleaved mapping for the same region of DRAM, and then write a token using one mapping and search for it with the other, allowing exact determination of the interleaving function w/o any dependence on timing.
Detailed reverse engineering of the LLC/LC ring bus using contention.
Descripiton of how to map offcore traffic counters to specific locations on the die, and a bit about the types of busses that are involved:
Description of the physical address to slice mapping (see references) and a strategy to have slice-locality in the L3:
https://people.kth.se/~farshin/documents/slice-aware-eurosys19.pdf
Good description of the uncore and QPI linkes, including LLC. For Westmere but probably still relevant in many ways:
Reverse engineering of the intermediate paging caches: Reverse Engineering Hardware Page Table Caches Using Side-Channel Attacks on the MMU and this similar set of slides: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-van_schaik.pdf.
John McCalpin's description of how coherency works in KNL and Skylake-SP is excellent:
Topology and Cache Coherence in Knights Landing and Skylake Xeon Processors
A list of all sorts of links to coherency and caching info on Intel processors.
The thesis Combining static and dynamic approaches to model loop performance in HPC has lots of good stuff in the appendices A and B, including methodologies for measuring the little-known load matrix size.
BTB and branch throughput measurement on Cloudflare blog
Skylake voltage thread on notebook review.
Underclock readme, including links to three tools that do the heavy lifting for you.
Power and energy use.
Energy Efficiency Features of the Intel Skylake-SPProcessor and Their Impact on Performance shows that 512-bit xor power consumption depends significantly on the number of bits which are flipped and also less strongly on the number of 1 bits in the output.
DGEMM performance is data-dependent describes how matrix multiplication performance varies based on the element values, with constant values having the lowest energy (possibly due to fewer bit flips in the FMA and associated circuitry).
John McCalpin on the costs of rdtsc
and rdtscp
on Intel Forum.
Another forum post with more rdtsc
details and some kernel module to measure performance.
The only thing I've found with any type of explanation of the PEBS shadow effect.
Intel, AMD, Graviton cloud CPU share.
All manuals (PPR for perf counters)