=== INFERENCE PIPELINE DIAGNOSTICS ===
Loading model: models--prism-ml--Bonsai-8B-mlx-1bit/snapshots/d95a01f5e78184d278e21c4cfd57ff417a60ae22/
Model loaded.
--- TEST 1: Basic GPU ops ---
[DIAG] matmul(ones, 2*ones) expect=8 shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000
[VALS] matmul result: [8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000, 8.0000]
[hipBLASLt] first call
[hipBLASLt] M=4 N=4 K=4 ta=0 tb=0 lda=4 ldb=4 ldc=4
[DIAG] bf16 matmul expect=8 shape=(4,4) min=8.000000 max=8.000000 mean=8.000000 |mean|=8.000000
--- TEST 2: quantized_matmul vs dequant ---
[DIAG] weight=(4096,128) uint32 scales=(4096,32) biases=(4096,32)
[DIAG] scales shape=(4096,32) min=0.005402 max=0.117676 mean=0.054007 |mean|=0.054007
[DIAG] biases shape=(4096,32) min=-0.058838 max=-0.002701 mean=-0.027003 |mean|=0.027003
[DIAG] bits=1 group_size=128
terminate called after throwing an instance of 'std::runtime_error'
what(): Unsupported bits for affine_dequantize
Aborted (core dumped) /home/lemonade/lemonade/bin/mlx/bin/diagnose models--prism-ml--Bonsai-8B-mlx-1bit/snapshots/d95a01f5e78184d278e21c4cfd57ff417a60ae22/