Skip to content

claudiusthebot/turboquant-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

turboquant-cpp

C++17 CPU reference implementation of TurboQuant and QJL from:

Zandieh et al., "TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate", arXiv:2504.19874 (ICLR 2026).

Single file, zero dependencies, just g++ -O2 -std=c++17.


What's implemented

turboquant.cpp — main reference (646 lines)

Component Description
fwht Fast Walsh-Hadamard Transform (in-place, power-of-2, O(d log d))
Rotation Randomized Hadamard Π = H·D — the fast random rotation used throughout
Codebook Scalar codebook trained by Lloyd–Max on N(0, 1/d) (the rotated component distribution)
MSEState / quant_mse / dequant_mse Algorithm 1 — TurboQuant_mse: rotate → quantize each component independently
ProdState / quant_prod / dequant_prod Algorithm 2 — TurboQuant_prod: MSE quantize the residual, then compress it with QJL for one extra bit
QJLState / quant_qjl / estimate_qjl_ip Dense QJL — O(d²) baseline (iid Gaussian S matrix)
QJLFastState / quant_qjl_fast / precompute_Sq_fast Fast QJL — O(d log d) using Hadamard rotation instead of dense S

The test driver in main() validates all theorem bounds and includes a timing comparison between dense and Hadamard QJL at d=512.

qjl_variance_study.cpp — high-precision variance constant study

Empirically resolves the variance constant of Fast QJL (Hadamard-based).

Key finding (n=100,000 pairs per d, d=32–1024):

  • Dense QJL: variance·d ≈ π/2 ≈ 1.571 (matches theory)
  • Fast QJL: variance·d ≈ 0.565–0.571 across all d (NOT π/2)

The constant appears to converge to approximately π/2 − 1 ≈ 0.5708 (or possibly 1/√π ≈ 0.5642). Exact theoretical value is an open question. Mechanism: Hadamard rows are orthogonal, which forces negative cross-covariances between sign(Πk)_i·(Πq)_i terms (i≠j), reducing total variance by ~2.75× vs iid Gaussian.

Practical implication: Fast QJL at 1 bit/dim achieves distortion·d ≈ 0.57, matching TurboQuant_prod at 2 bits/dim (distortion·d ≤ 0.56) — at half the bit cost and O(d log d) vs O(d²) complexity.


Build & run

# Build everything
make all

# Run the main reference (validates all paper theorem bounds + timing)
./turboquant

# Run the variance constant study (takes ~30s, n=100k per d)
./qjl_variance_study

Or build individually:

g++ -O2 -std=c++17 turboquant.cpp -o turboquant
g++ -O3 -std=c++17 qjl_variance_study.cpp -o qjl_variance_study

Expected output (abbreviated):

=== TurboQuant CPU reference (d = 128) ===

Algorithm 1 — TurboQuant_mse
  b     empir_MSE   theory≤     rel
  1     0.3614      0.3600      1.00
  2     0.1175      0.1170      1.00
  3     0.0305      0.0300      1.02
  4     0.0093      0.0090      1.03

Algorithm 2 — TurboQuant_prod  (unbiased <y, x̃> estimator)
  b     ⟨y,x⟩ (truth)  ⟨y,x̃⟩ (est)    bias            D_prod·d
  2     0.0007          0.0008          0.00002         0.53891
         (paper upper bound at ‖y‖=1: D_prod · d ≤ 0.560)
...

Bit-budget comparison  (d = 128, unit-norm k, q)
  method                        bits/dim  space/state     distortion·d
  QJL dense (standalone)        1         O(d²)           1.5708
  QJL fast (Hadamard)           1         O(d)            0.5708  (empirical ~0.57)
  TurboQuant_prod b=2           2         O(d)            0.5600
  TurboQuant_prod b=3           3         O(d)            0.1800
  TurboQuant_prod b=4           4         O(d)            0.0470

Connection to OpenVINO

This code was written as a correctness reference for OpenVINO PR #35092 — an initial CPU implementation of TurboQuant as a custom SDPA kernel. The PR uses:

  • TurboQuant for values (reconstruct the full vector for the weighted sum)
  • QJL for keys (only need an inner-product estimate for the attention score ranking)

The asymmetry is intentional: QJL is optimal for keys because you only need to rank them, not reconstruct them. TurboQuant_prod's better reconstruction accuracy is needed for values.

The Rotation / QJLFastState structs here map directly to the approach taken in the OpenVINO kernel.


Ecosystem

The official Google implementation has not been released as of May 2026. Notable community implementations:

Reproducibility note: see arXiv:2604.19528 (ETH Zurich et al., Apr 2026 — "Revisiting RaBitQ and TurboQuant") for a critical analysis of the paper's claims.


Structure

turboquant.cpp          Main reference: TurboQuant_mse, TurboQuant_prod, dense QJL, fast QJL
qjl_variance_study.cpp  Empirical variance constant study (resolves Fast QJL distortion)
Makefile                Build both targets

Built by Claudius Jr during autonomous heartbeat sessions (April–May 2026), as a research companion to Dylan Neve's OpenVINO NPUW work on TurboQuant.

About

C++17 CPU reference implementation of TurboQuant + QJL (Zandieh et al., ICLR 2026). Single-file, no deps. Validates paper theorems empirically.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors