C++17 CPU reference implementation of TurboQuant and QJL from:
Zandieh et al., "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate", arXiv:2504.19874 (ICLR 2026).
Single file, zero dependencies, just g++ -O2 -std=c++17.
| Component | Description |
|---|---|
fwht |
Fast Walsh-Hadamard Transform (in-place, power-of-2, O(d log d)) |
Rotation |
Randomized Hadamard Π = H·D — the fast random rotation used throughout |
Codebook |
Scalar codebook trained by Lloyd–Max on N(0, 1/d) (the rotated component distribution) |
MSEState / quant_mse / dequant_mse |
Algorithm 1 — TurboQuant_mse: rotate → quantize each component independently |
ProdState / quant_prod / dequant_prod |
Algorithm 2 — TurboQuant_prod: MSE quantize the residual, then compress it with QJL for one extra bit |
QJLState / quant_qjl / estimate_qjl_ip |
Dense QJL — O(d²) baseline (iid Gaussian S matrix) |
QJLFastState / quant_qjl_fast / precompute_Sq_fast |
Fast QJL — O(d log d) using Hadamard rotation instead of dense S |
The test driver in main() validates all theorem bounds and includes a timing comparison between dense and Hadamard QJL at d=512.
Empirically resolves the variance constant of Fast QJL (Hadamard-based).
Key finding (n=100,000 pairs per d, d=32–1024):
- Dense QJL:
variance·d ≈ π/2 ≈ 1.571(matches theory) - Fast QJL:
variance·d ≈ 0.565–0.571across all d (NOT π/2)
The constant appears to converge to approximately π/2 − 1 ≈ 0.5708 (or possibly 1/√π ≈ 0.5642). Exact theoretical value is an open question. Mechanism: Hadamard rows are orthogonal, which forces negative cross-covariances between sign(Πk)_i·(Πq)_i terms (i≠j), reducing total variance by ~2.75× vs iid Gaussian.
Practical implication: Fast QJL at 1 bit/dim achieves distortion·d ≈ 0.57, matching TurboQuant_prod at 2 bits/dim (distortion·d ≤ 0.56) — at half the bit cost and O(d log d) vs O(d²) complexity.
# Build everything
make all
# Run the main reference (validates all paper theorem bounds + timing)
./turboquant
# Run the variance constant study (takes ~30s, n=100k per d)
./qjl_variance_studyOr build individually:
g++ -O2 -std=c++17 turboquant.cpp -o turboquant
g++ -O3 -std=c++17 qjl_variance_study.cpp -o qjl_variance_studyExpected output (abbreviated):
=== TurboQuant CPU reference (d = 128) ===
Algorithm 1 — TurboQuant_mse
b empir_MSE theory≤ rel
1 0.3614 0.3600 1.00
2 0.1175 0.1170 1.00
3 0.0305 0.0300 1.02
4 0.0093 0.0090 1.03
Algorithm 2 — TurboQuant_prod (unbiased <y, x̃> estimator)
b ⟨y,x⟩ (truth) ⟨y,x̃⟩ (est) bias D_prod·d
2 0.0007 0.0008 0.00002 0.53891
(paper upper bound at ‖y‖=1: D_prod · d ≤ 0.560)
...
Bit-budget comparison (d = 128, unit-norm k, q)
method bits/dim space/state distortion·d
QJL dense (standalone) 1 O(d²) 1.5708
QJL fast (Hadamard) 1 O(d) 0.5708 (empirical ~0.57)
TurboQuant_prod b=2 2 O(d) 0.5600
TurboQuant_prod b=3 3 O(d) 0.1800
TurboQuant_prod b=4 4 O(d) 0.0470
This code was written as a correctness reference for OpenVINO PR #35092 — an initial CPU implementation of TurboQuant as a custom SDPA kernel. The PR uses:
- TurboQuant for values (reconstruct the full vector for the weighted sum)
- QJL for keys (only need an inner-product estimate for the attention score ranking)
The asymmetry is intentional: QJL is optimal for keys because you only need to rank them, not reconstruct them. TurboQuant_prod's better reconstruction accuracy is needed for values.
The Rotation / QJLFastState structs here map directly to the approach taken in the OpenVINO kernel.
The official Google implementation has not been released as of May 2026. Notable community implementations:
0xSero/turboquant— Python/Triton/vLLM, all 9/9 theorem validations passTheTom/llama-cpp-turboquant— TurboQuant + llama.cppRyanCodrai/turbovec— Rust+Python vector index (different use-case: vector search)
Reproducibility note: see arXiv:2604.19528 (ETH Zurich et al., Apr 2026 — "Revisiting RaBitQ and TurboQuant") for a critical analysis of the paper's claims.
turboquant.cpp Main reference: TurboQuant_mse, TurboQuant_prod, dense QJL, fast QJL
qjl_variance_study.cpp Empirical variance constant study (resolves Fast QJL distortion)
Makefile Build both targets
Built by Claudius Jr during autonomous heartbeat sessions (April–May 2026), as a research companion to Dylan Neve's OpenVINO NPUW work on TurboQuant.