Don't crash on Linux machines with L4 cache #28

dfyz · 2024-05-27T23:57:44Z

I recently found out that this is a thing when trying to run a candle program (which depends on gemm) on this machine:

# grep 'model name' /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz
...
# cat /sys/devices/system/cpu/cpu*/cache/index4/level
4
...
# lscpu -B -C=type,level,ways,coherency-size,one-size
TYPE        LEVEL WAYS COHERENCY-SIZE  ONE-SIZE
Data            1    8             64     32768
Instruction     1    8             64     32768
Unified         2    8             64    262144
Unified         3   12             64   6291456
Unified         4   16             64 134217728

The Linux-specific code path that probes cache sizes via lscpu and sysfs assumes that level can't be greater than 3, so without this PR anything using gemm crashes like this:

index out of bounds: the len is 3 but the index is 3

This PR fixes this by adding a guard identical to the one existing in the generic X86 cache size probing code.

(an interesting theoretical question is whether it is possible to somehow exploit this gigantic 128 MiB cache instead of ignoring it)

sarah-quinones · 2024-05-28T05:50:12Z

an alternative approach that would make use of the cache is doing something like let level = Ord::min(level, 3)

dfyz · 2024-05-29T00:31:04Z

an alternative approach that would make use of the cache is doing something like let level = Ord::min(level, 3)

I just tried that, but it appears to be trickier than I thought at first:

fs::read_dir() doesn't guarantee any particular order, and on my system, .../level4 comes before .../level3. Since ties are currently resolved by the cache line size (and they are are all the same on my machine), the L3 cache "wins" and overwrites the last slot in the cache hierarchy.
When I worked around that by temporarily falling back to the lscpu code path (which preserves ordering), I noticed no performance improvement on a large matmul (more specifically, 8K×8K×8K f32 NN GEMM). Either I did something wrong, or the large macropanel size somehow doesn't help (I double-checked that it increased from 2736 to 8196).

Perhaps it makes sense to merge the fix for the crashes first, and then think of exploiting the L4 cache. By the way, I also added an additional commit that prevents the lscpu code path from crashing (my bad, I completely forgot about it).

Don't crash on Linux machines with L4 cache

e952d98

Don't crash in the lscpu code path either

687beea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't crash on Linux machines with L4 cache #28

Don't crash on Linux machines with L4 cache #28

dfyz commented May 27, 2024

sarah-quinones commented May 28, 2024

dfyz commented May 29, 2024

Don't crash on Linux machines with L4 cache #28

Are you sure you want to change the base?

Don't crash on Linux machines with L4 cache #28

Conversation

dfyz commented May 27, 2024

sarah-quinones commented May 28, 2024

dfyz commented May 29, 2024