Skip to content

[GPU] paddle.sort and paddle.argsort crash on zero-batch tensors #79367

Description

@tingPetty

bug描述 Describe the Bug

paddle.sort and paddle.argsort fail on GPU when the input tensor has a zero-sized dimension, even when the sorted axis itself has a positive size. On CPU, paddle.sort handles the same empty tensor shape and returns an empty output tensor.

Minimal reproducing example:

import traceback
import paddle

print("paddle:", paddle.__version__)
paddle.device.set_device("gpu:0")

cases = [
    (
        "sort",
        lambda: paddle.sort(
            paddle.empty([0, 3], dtype="int64"),
            axis=1,
            descending=True,
        ),
    ),
    (
        "argsort",
        lambda: paddle.argsort(
            paddle.empty([0, 3, 4], dtype="int32"),
            axis=-2,
            descending=True,
        ),
    ),
]

for name, fn in cases:
    print("running paddle." + name)
    try:
        out = fn()
        print(out.shape, out.dtype, out.place)
    except Exception:
        traceback.print_exc()

Expected result:

running paddle.sort
[0, 3] paddle.int64 Place(gpu:0)
running paddle.argsort
[0, 3, 4] paddle.int64 Place(gpu:0)

Actual result:

running paddle.sort
Traceback (most recent call last):
  File "<string>", line 25, in <module>
  File ".../site-packages/paddle/tensor/search.py", line 560, in sort
    outs, _ = _C_ops.argsort(x, axis, descending)
OSError: (External) CUDA error(9), invalid configuration argument.
  [Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device.]
  (at ../paddle/phi/kernels/gpu/argsort_kernel.cu:225)

running paddle.argsort
Traceback (most recent call last):
  File "<string>", line 25, in <module>
  File ".../site-packages/paddle/tensor/search.py", line 103, in argsort
    _, ids = _C_ops.argsort(x, axis, descending)
OSError: (External) CUDA error(9), invalid configuration argument.
  [Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device.]
  (at ../paddle/phi/kernels/gpu/argsort_kernel.cu:225)

For reference, paddle.sort succeeds on CPU for the same shape:

import paddle

paddle.device.set_device("cpu")
x = paddle.empty([0, 3], dtype="int64")
out = paddle.sort(x, axis=1, descending=True)
print(out.shape, out.dtype, out.place)
[0, 3] paddle.int64 Place(cpu)

TensorFlow also handles the corresponding empty argsort case on GPU:

import tensorflow as tf

with tf.device("/GPU:0"):
    x = tf.zeros([0, 3, 4], dtype=tf.int32)
    out = tf.argsort(x, axis=-2, direction="DESCENDING")
print(out.shape, out.dtype, out.device)
(0, 3, 4) <dtype: 'int32'> /job:localhost/replica:0/task:0/device:GPU:0

其他补充信息 Additional Supplementary Information

Reproduced in fresh Python processes with CUDA_LAUNCH_BLOCKING=1.
Environment:

  • Python: 3.10.20
  • PaddlePaddle: 2.6.1
  • GPU: NVIDIA GeForce RTX 3090
  • NVIDIA driver: 595.58.03
  • Paddle CUDA runtime: 11.7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions