From 6eb7d4cf829bb224c12e62a5ea7747a1cf683821 Mon Sep 17 00:00:00 2001 From: Ken Ahrens Date: Mon, 25 May 2026 20:40:22 -0400 Subject: [PATCH] speed-bench: add M5 Max 128GB q2-q4-imatrix curve Bench data for the q2-q4-imatrix mixed Flash quant (last 6 expert layers Q4K, rest IQ2XXS) on M5 Max 128GB, macOS 26.4.1. Fills the unanswered request in #226 for q2-q4-imatrix benchmark numbers, and extends published M5 Max coverage past the 65K point from #97 into the 100K-200K range. Command: ds4-bench -m ds4flash.gguf --prompt-file speed-bench/promessi_sposi.txt --ctx-start 2048 --ctx-max 200000 --step-incr 16384 --gen-tokens 128 Build: ad0209f (Metal 4 tensor API + decode-indexer top-k path from #169 enabled). Highlights vs M5 Max q2-imatrix from #97 (same hardware tier): - 2K decode: 34.4 t/s (vs 31.5 t/s, +9%) - 2K prefill: 413.9 t/s (vs 372.2 t/s, +11%) - 32K decode: 27.8 t/s (vs 28.9 t/s, -4%) - 65K decode: 25.8 t/s (vs 27.0 t/s, -4%) q2-q4 is faster than q2 at low ctx (Q4 layers + Metal 4 win) and ~4% slower above 32K (more bandwidth-bound). Closes #226 with data. --- speed-bench/m5_max_q2q4_imatrix.csv | 15 ++++++++ speed-bench/m5_max_q2q4_imatrix_ts.svg | 52 ++++++++++++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 speed-bench/m5_max_q2q4_imatrix.csv create mode 100644 speed-bench/m5_max_q2q4_imatrix_ts.svg diff --git a/speed-bench/m5_max_q2q4_imatrix.csv b/speed-bench/m5_max_q2q4_imatrix.csv new file mode 100644 index 000000000..b84bf647a --- /dev/null +++ b/speed-bench/m5_max_q2q4_imatrix.csv @@ -0,0 +1,15 @@ +ctx_tokens,prefill_tokens,prefill_tps,gen_tokens,gen_tps,kvcache_bytes +2048,2048,413.85,128,34.42,52184460 +18432,16384,405.31,128,28.42,277693836 +34816,16384,374.49,128,27.75,503203212 +51200,16384,333.84,128,26.79,728712588 +67584,16384,298.66,128,25.75,954221964 +83968,16384,269.69,128,25.43,1179731340 +100352,16384,248.99,128,24.36,1405240716 +116736,16384,230.49,128,23.63,1630750092 +133120,16384,215.12,128,22.37,1856259468 +149504,16384,198.15,128,21.70,2081768844 +165888,16384,187.32,128,20.72,2307278220 +182272,16384,176.49,128,20.16,2532787596 +198656,16384,165.14,128,19.54,2758296972 +200000,1344,157.02,128,19.37,2776775308 diff --git a/speed-bench/m5_max_q2q4_imatrix_ts.svg b/speed-bench/m5_max_q2q4_imatrix_ts.svg new file mode 100644 index 000000000..219dbafa0 --- /dev/null +++ b/speed-bench/m5_max_q2q4_imatrix_ts.svg @@ -0,0 +1,52 @@ + + + + +M5 Max (128GB) q2-q4-imatrix t/s + +0 + +100 + +200 + +300 + +400 + +500 +0 +10 +20 +30 +40 + +0 + +50k + +100k + +150k + +200k + + + +ctx size +prefill t/s +generation t/s + + + + +prefill + +generation +