speed-bench: add NVIDIA RTX PRO 6000 Blackwell results by imbibekk · Pull Request #256 · antirez/ds4

imbibekk · 2026-05-26T05:39:06Z

Numbers collected on a single NVIDIA RTX PRO 6000 Blackwell Server Edition (97 GB VRAM, compute capability sm_120), CUDA 13.0, built with make cuda-generic. Only GPU 0 is used since ds4_gpu_init selects a single device.

Sweep (default flags):

./ds4-bench -m ds4flash.gguf
--prompt-file speed-bench/promessi_sposi.txt
--ctx-start 2048 --ctx-max 65536 --step-incr 2048
--gen-tokens 128
--csv speed-bench/rtx_pro_6000_blackwell.csv

The README row reports the 8192-token frontier so it sits next to the existing DGX Spark GB10 row (7047 tokens) for direct comparison. The full 2048..65536 sweep is in the CSV.

Caveat: at long context the engine logs
"CUDA q8 fp16 cache budget exhausted; using q8 kernels" because the 80.76 GiB model plus context buffers leave only about 1 GiB free on a 97 GiB card, so all frontiers reflect the fallback q8 kernel path rather than the q8/fp16 cached fast path.

Numbers collected on a single NVIDIA RTX PRO 6000 Blackwell Server Edition (97 GB VRAM, compute capability sm_120), CUDA 13.0, built with `make cuda-generic`. Only GPU 0 is used since `ds4_gpu_init` selects a single device. Sweep (default flags): ./ds4-bench -m ds4flash.gguf \ --prompt-file speed-bench/promessi_sposi.txt \ --ctx-start 2048 --ctx-max 65536 --step-incr 2048 \ --gen-tokens 128 \ --csv speed-bench/rtx_pro_6000_blackwell.csv The README row reports the 8192-token frontier so it sits next to the existing DGX Spark GB10 row (7047 tokens) for direct comparison. The full 2048..65536 sweep is in the CSV. Caveat: at long context the engine logs "CUDA q8 fp16 cache budget exhausted; using q8 kernels" because the 80.76 GiB model plus context buffers leave only about 1 GiB free on a 97 GiB card, so all frontiers reflect the fallback q8 kernel path rather than the q8/fp16 cached fast path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed-bench: add NVIDIA RTX PRO 6000 Blackwell results#256

speed-bench: add NVIDIA RTX PRO 6000 Blackwell results#256
imbibekk wants to merge 1 commit into
antirez:mainfrom
imbibekk:speedbench-rtx-pro-6000-blackwell

imbibekk commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

imbibekk commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant