Add dem_sampling CPU/GPU across C++ and Python by kvmto · Pull Request #479 · NVIDIA/cudaqx

kvmto · 2026-04-01T16:02:03Z

Summary

Add dem_sampling in C++ with both CPU and cuStabilizer-backed GPU paths.
Expose the feature through pybind and Python (cudaq_qec.dem_sampling) with backend selection (auto / cpu / gpu), plus PyTorch tensor support including CUDA device-pointer flow for GPU execution.
Add end-to-end coverage for C++ and Python paths (CPU/GPU, NumPy/PyTorch) and wire build/packaging so cuStabilizer is discovered and can be required for shipping builds.

Build/Packaging updates

Add FindcuStabilizer CMake module and QEC CMake integration.
Add CUDAQ_QEC_REQUIRE_CUSTABILIZER enforcement path for builds that must ship GPU support.
Update libs/qec/pyproject.toml.cu12 and libs/qec/pyproject.toml.cu13 for dem_sampling optional dependencies (including torch + cuquantum extras).

Test plan

C++: run dem_sampling unit tests (DemSamplingCPU, DemSamplingGPU, and related QEC integration tests) in CI.
Python: run libs/qec/python/tests/test_dem_sampling.py in CI with CUDA-enabled torch.

Introduce dem_sampling implementations for CPU and cuStabilizer-backed GPU paths in C++, and expose them through pybind/Python with torch tensor and device-pointer support. Add C++/Python coverage for backend paths and wire build/packaging checks so cuStabilizer requirements are enforced for shipping. Signed-off-by: kvmto <kmato@nvidia.com>

Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 · 2026-04-01T19:07:08Z

libs/qec/pyproject.toml.cu12

  "tensorrt-cu12"
 ]
+dem_sampling = [
+  "torch",


Is there a way to do DEM sampling without Torch?

yes with numpy, which is the most likely case for the usage of the cpu implementation

Signed-off-by: kvmto <kmato@nvidia.com>

- Use cudaMallocAsync/cudaFreeAsync for all GPU temporaries to avoid implicit device synchronization that breaks multi-stream concurrency (critical for PyTorch CUDA stream integration) - Replace synchronous cudaMemcpy with cudaMemcpyAsync on the caller's stream for the probability D->H copy - Add grid dimension overflow guards before every CUDA kernel launch - Handle numShots=0 gracefully in both C++ CPU path and Python binding - Binarize check_matrix with & 1u in CPU path to match GPU kernel behavior and prevent uint8 dot-product overflow - Clear sticky CUDA errors (cudaGetLastError) on all failure paths in the Python binding's GPU allocation/copy helpers - Fix pre-existing test_non_default_cuda_stream assertion that compared torch.device("cuda") against torch.device("cuda", index=0) - Add 12 new tests covering zero-shot edge case, non-binary H matrix CPU/GPU parity, and seedless code path (5 C++, 7 Python) Signed-off-by: kvmto <kmato@nvidia.com>

Signed-off-by: kvmto <kmato@nvidia.com>

kvmto requested review from bmhowe23, ivanbasov and wsttiger April 1, 2026 16:02

kvmto added 2 commits April 1, 2026 16:03

fixed linting

06cca91

Signed-off-by: kvmto <kmato@nvidia.com>

build change

873dba3

Signed-off-by: kvmto <kmato@nvidia.com>

bmhowe23 reviewed Apr 1, 2026

View reviewed changes

kvmto added 14 commits April 1, 2026 22:08

refactoring of build system

96cc666

Signed-off-by: kvmto <kmato@nvidia.com>

pip upgrade

99d9df8

Signed-off-by: kvmto <kmato@nvidia.com>

pip upgrade in docs

0775a03

Signed-off-by: kvmto <kmato@nvidia.com>

safety net for the downgrade of custab

bba1889

Signed-off-by: kvmto <kmato@nvidia.com>

multi path custab integration

53f5549

Signed-off-by: kvmto <kmato@nvidia.com>

bug in tensor data type for degenerate H matrices

e8efde5

Signed-off-by: kvmto <kmato@nvidia.com>

Merge remote-tracking branch 'upstream/main' into dem_sampling_pr

ce3aace

more selectivity on having gpus and torch for custab

bdfb7f5

Signed-off-by: kvmto <kmato@nvidia.com>

attempt to make custabilizer and torch optional

55369ff

Signed-off-by: kvmto <kmato@nvidia.com>

quick lint

a9ea5c5

Signed-off-by: kvmto <kmato@nvidia.com>

fix docs CI: pin sphinx_toolbox>=4.1.2 for Sphinx 8.2+ compat

063d27d

Signed-off-by: kvmto <kmato@nvidia.com>

quick docs attempt fix

578b5cc

Signed-off-by: kvmto <kmato@nvidia.com>

nth attempt on docs dependency

40b186b

Signed-off-by: kvmto <kmato@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dem_sampling CPU/GPU across C++ and Python#479

Add dem_sampling CPU/GPU across C++ and Python#479
kvmto wants to merge 17 commits intoNVIDIA:mainfrom
kvmto:dem_sampling_pr

kvmto commented Apr 1, 2026

Uh oh!

bmhowe23 Apr 1, 2026

Uh oh!

kvmto Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kvmto commented Apr 1, 2026

Summary

Build/Packaging updates

Test plan

Uh oh!

bmhowe23 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kvmto Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants