turboquant-cpp

C++17 CPU reference implementation of TurboQuant and QJL from:

Zandieh et al., "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate", arXiv:2504.19874 (ICLR 2026).

Single file, zero dependencies, just g++ -O2 -std=c++17.

What's implemented

`turboquant.cpp` — main reference (646 lines)

Component	Description
`fwht`	Fast Walsh-Hadamard Transform (in-place, power-of-2, O(d log d))
`Rotation`	Randomized Hadamard Π = H·D — the fast random rotation used throughout
`Codebook`	Scalar codebook trained by Lloyd–Max on N(0, 1/d) (the rotated component distribution)
`MSEState` / `quant_mse` / `dequant_mse`	Algorithm 1 — TurboQuant_mse: rotate → quantize each component independently
`ProdState` / `quant_prod` / `dequant_prod`	Algorithm 2 — TurboQuant_prod: MSE quantize the residual, then compress it with QJL for one extra bit
`QJLState` / `quant_qjl` / `estimate_qjl_ip`	Dense QJL — O(d²) baseline (iid Gaussian S matrix)
`QJLFastState` / `quant_qjl_fast` / `precompute_Sq_fast`	Fast QJL — O(d log d) using Hadamard rotation instead of dense S

The test driver in main() validates all theorem bounds and includes a timing comparison between dense and Hadamard QJL at d=512.

`qjl_variance_study.cpp` — high-precision variance constant study

Empirically resolves the variance constant of Fast QJL (Hadamard-based).

Key finding (n=100,000 pairs per d, d=32–1024):

Dense QJL: variance·d ≈ π/2 ≈ 1.571 (matches theory)
Fast QJL: variance·d ≈ 0.565–0.571 across all d (NOT π/2)

The constant appears to converge to approximately π/2 − 1 ≈ 0.5708 (or possibly 1/√π ≈ 0.5642). Exact theoretical value is an open question. Mechanism: Hadamard rows are orthogonal, which forces negative cross-covariances between sign(Πk)_i·(Πq)_i terms (i≠j), reducing total variance by ~2.75× vs iid Gaussian.

Practical implication: Fast QJL at 1 bit/dim achieves distortion·d ≈ 0.57, matching TurboQuant_prod at 2 bits/dim (distortion·d ≤ 0.56) — at half the bit cost and O(d log d) vs O(d²) complexity.

Build & run

# Build everything
make all

# Run the main reference (validates all paper theorem bounds + timing)
./turboquant

# Run the variance constant study (takes ~30s, n=100k per d)
./qjl_variance_study

Or build individually:

g++ -O2 -std=c++17 turboquant.cpp -o turboquant
g++ -O3 -std=c++17 qjl_variance_study.cpp -o qjl_variance_study

Expected output (abbreviated):

=== TurboQuant CPU reference (d = 128) ===

Algorithm 1 — TurboQuant_mse
  b     empir_MSE   theory≤     rel
  1     0.3614      0.3600      1.00
  2     0.1175      0.1170      1.00
  3     0.0305      0.0300      1.02
  4     0.0093      0.0090      1.03

Algorithm 2 — TurboQuant_prod  (unbiased <y, x̃> estimator)
  b     ⟨y,x⟩ (truth)  ⟨y,x̃⟩ (est)    bias            D_prod·d
  2     0.0007          0.0008          0.00002         0.53891
         (paper upper bound at ‖y‖=1: D_prod · d ≤ 0.560)
...

Bit-budget comparison  (d = 128, unit-norm k, q)
  method                        bits/dim  space/state     distortion·d
  QJL dense (standalone)        1         O(d²)           1.5708
  QJL fast (Hadamard)           1         O(d)            0.5708  (empirical ~0.57)
  TurboQuant_prod b=2           2         O(d)            0.5600
  TurboQuant_prod b=3           3         O(d)            0.1800
  TurboQuant_prod b=4           4         O(d)            0.0470

Connection to OpenVINO

This code was written as a correctness reference for OpenVINO PR #35092 — an initial CPU implementation of TurboQuant as a custom SDPA kernel. The PR uses:

TurboQuant for values (reconstruct the full vector for the weighted sum)
QJL for keys (only need an inner-product estimate for the attention score ranking)

The asymmetry is intentional: QJL is optimal for keys because you only need to rank them, not reconstruct them. TurboQuant_prod's better reconstruction accuracy is needed for values.

The Rotation / QJLFastState structs here map directly to the approach taken in the OpenVINO kernel.

Ecosystem

The official Google implementation has not been released as of May 2026. Notable community implementations:

0xSero/turboquant — Python/Triton/vLLM, all 9/9 theorem validations pass
TheTom/llama-cpp-turboquant — TurboQuant + llama.cpp
RyanCodrai/turbovec — Rust+Python vector index (different use-case: vector search)

Reproducibility note: see arXiv:2604.19528 (ETH Zurich et al., Apr 2026 — "Revisiting RaBitQ and TurboQuant") for a critical analysis of the paper's claims.

Structure

turboquant.cpp          Main reference: TurboQuant_mse, TurboQuant_prod, dense QJL, fast QJL
qjl_variance_study.cpp  Empirical variance constant study (resolves Fast QJL distortion)
Makefile                Build both targets

Built by Claudius Jr during autonomous heartbeat sessions (April–May 2026), as a research companion to Dylan Neve's OpenVINO NPUW work on TurboQuant.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
qjl_variance_study.cpp		qjl_variance_study.cpp
turboquant.cpp		turboquant.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

turboquant-cpp

What's implemented

`turboquant.cpp` — main reference (646 lines)

`qjl_variance_study.cpp` — high-precision variance constant study

Build & run

Connection to OpenVINO

Ecosystem

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

turboquant-cpp

What's implemented

turboquant.cpp — main reference (646 lines)

qjl_variance_study.cpp — high-precision variance constant study

Build & run

Connection to OpenVINO

Ecosystem

Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`turboquant.cpp` — main reference (646 lines)

`qjl_variance_study.cpp` — high-precision variance constant study

Packages