Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575

pavel-shmakov · 2024-10-15T10:30:12Z

We've encountered an issue where cufinufft.nufft3d1 outputs wildly incorrect results for very specific inputs and only on certain GPUs. This can be reproduced by running the following code on an A100 GPU:

import torch
import cufinufft

points = torch.load("points.pt")
values = torch.load("values.pt")
spectrum = cufinufft.nufft3d1(
        *points,
        values,
        (64, 64, 64),
        isign=-1,
        eps=1e-06
)
print(torch.linalg.norm(spectrum).item())

Here's an archive with points.pt and values.pt: inputs.zip

The value is many orders of magnitude greater than it should be. It also grows quickly with decreasing eps.

Notes:

We reproduced this both for cufinufft 2.2.0 and 2.3.0.
Reproduced on A100, but not on A10G. We haven't tried other GPUs.
The "blow-up" happens for specific grid sizes: from 61 to 64, while for 60, 65 and beyond it goes back to normal. This is for float32 inputs; for float64, we saw a "blow-up" for grid size 32.
We compiled cufinufft from sources to investigate further, but surprisingly couldn't reproduce the bug. We've tried compiling from master and v2.3.X as well as various compilation options. If you could point us to the compilation options with which the release version of libcufinufft.so is built, that would be helpful, and we can investigate further!

The text was updated successfully, but these errors were encountered:

pavel-shmakov · 2024-10-15T12:59:25Z

Smaller reproducer with just one point:

batch_size = 32
v = torch.tensor([[1] for i in range(batch_size)], dtype=torch.complex64, device="cuda")
p = torch.tensor([[0], [0], [0]], dtype=torch.float32, device='cuda')
spectrum = cufinufft.nufft3d1(*p, v, (64, 64, 64), eps=1e-6)

The spectra should be 1 everywhere, which it is for batch_size < 16. For batch_size >= 16 it starts misbehaving.

DiamonDinoia · 2025-01-15T19:21:48Z

What happens if we use GM instead of SM? https://finufft.readthedocs.io/en/latest/c_gpu.html#options-for-gpu-code gpu_method should be supported in python too.

pavel-shmakov · 2025-01-16T11:31:00Z

With gpu_method=1 we are also getting an incorrect, but very different answer on A100:

batch_size = 32
n_modes = 64
points = torch.tensor([[0], [0], [0]], dtype=torch.float32, device='cuda')
values = torch.tensor([[1] for i in range(batch_size)], dtype=torch.complex64, device="cuda")
for gpu_method in [1, 2]:
    spectrum = cufinufft.nufft3d1(*points, values, (n_modes, n_modes, n_modes), eps=1e-6, gpu_method=gpu_method)
    print(f"{gpu_method=}: {spectrum[0, 0, 0, 0].item()}")

A100:

gpu_method=1: (-4.974409603164531e-05-0.00036744182580150664j)
gpu_method=2: (2097152.5-0.16087542474269867j)

On T4 all good:

gpu_method=1: (1.000000238418579+0j)
gpu_method=2: (1.000000238418579+0j)

DiamonDinoia · 2025-01-16T16:04:45Z

@janden could you provide the command to do a debug build with pip? I saw this type of errors when using debug symbols. In my tests if I compile with -G nvcc generates an incorrect binary that stacks overflows while spreading but it does not crash. It just generates an output that is wrong in some points.

@pavel-shmakov could you try a bigger eps? 1e-2 or 1e-3?

pavel-shmakov · 2025-01-17T15:09:31Z

@pavel-shmakov could you try a bigger eps? 1e-2 or 1e-3?

eps=0.1: (0.5681691765785217-1.674233078956604j)
eps=0.01: 0j
eps=0.001: (0.1657368689775467-0.00020104330906178802j)
eps=0.0001: (1.0010954141616821+0j)
eps=1e-05: (2097339.75+10.764257431030273j)
eps=1e-06: (2097152.5-0.16087542474269867j)

DiamonDinoia · 2025-01-21T20:18:54Z

@pavel-shmakov for the local compilation which version of CUDA are you using?
we create the binary using this script: https://github.com/flatironinstitute/finufft/blob/master/tools/cufinufft/distribution_helper.sh

If we move to email we could share binary wheels with different flags to narrow down the issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575

Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575

pavel-shmakov commented Oct 15, 2024 •

edited

Loading

pavel-shmakov commented Oct 15, 2024

DiamonDinoia commented Jan 15, 2025

pavel-shmakov commented Jan 16, 2025

DiamonDinoia commented Jan 16, 2025

pavel-shmakov commented Jan 17, 2025

DiamonDinoia commented Jan 21, 2025 •

edited

Loading

Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575

Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575

Comments

pavel-shmakov commented Oct 15, 2024 • edited Loading

pavel-shmakov commented Oct 15, 2024

DiamonDinoia commented Jan 15, 2025

pavel-shmakov commented Jan 16, 2025

DiamonDinoia commented Jan 16, 2025

pavel-shmakov commented Jan 17, 2025

DiamonDinoia commented Jan 21, 2025 • edited Loading

pavel-shmakov commented Oct 15, 2024 •

edited

Loading

DiamonDinoia commented Jan 21, 2025 •

edited

Loading