bug描述 Describe the Bug
paddle.sort and paddle.argsort fail on GPU when the input tensor has a zero-sized dimension, even when the sorted axis itself has a positive size. On CPU, paddle.sort handles the same empty tensor shape and returns an empty output tensor.
Minimal reproducing example:
import traceback
import paddle
print("paddle:", paddle.__version__)
paddle.device.set_device("gpu:0")
cases = [
(
"sort",
lambda: paddle.sort(
paddle.empty([0, 3], dtype="int64"),
axis=1,
descending=True,
),
),
(
"argsort",
lambda: paddle.argsort(
paddle.empty([0, 3, 4], dtype="int32"),
axis=-2,
descending=True,
),
),
]
for name, fn in cases:
print("running paddle." + name)
try:
out = fn()
print(out.shape, out.dtype, out.place)
except Exception:
traceback.print_exc()
Expected result:
running paddle.sort
[0, 3] paddle.int64 Place(gpu:0)
running paddle.argsort
[0, 3, 4] paddle.int64 Place(gpu:0)
Actual result:
running paddle.sort
Traceback (most recent call last):
File "<string>", line 25, in <module>
File ".../site-packages/paddle/tensor/search.py", line 560, in sort
outs, _ = _C_ops.argsort(x, axis, descending)
OSError: (External) CUDA error(9), invalid configuration argument.
[Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device.]
(at ../paddle/phi/kernels/gpu/argsort_kernel.cu:225)
running paddle.argsort
Traceback (most recent call last):
File "<string>", line 25, in <module>
File ".../site-packages/paddle/tensor/search.py", line 103, in argsort
_, ids = _C_ops.argsort(x, axis, descending)
OSError: (External) CUDA error(9), invalid configuration argument.
[Hint: 'cudaErrorInvalidConfiguration'. This indicates that a kernel launch is requesting resources that can never be satisfied by the current device.]
(at ../paddle/phi/kernels/gpu/argsort_kernel.cu:225)
For reference, paddle.sort succeeds on CPU for the same shape:
import paddle
paddle.device.set_device("cpu")
x = paddle.empty([0, 3], dtype="int64")
out = paddle.sort(x, axis=1, descending=True)
print(out.shape, out.dtype, out.place)
[0, 3] paddle.int64 Place(cpu)
TensorFlow also handles the corresponding empty argsort case on GPU:
import tensorflow as tf
with tf.device("/GPU:0"):
x = tf.zeros([0, 3, 4], dtype=tf.int32)
out = tf.argsort(x, axis=-2, direction="DESCENDING")
print(out.shape, out.dtype, out.device)
(0, 3, 4) <dtype: 'int32'> /job:localhost/replica:0/task:0/device:GPU:0
其他补充信息 Additional Supplementary Information
Reproduced in fresh Python processes with CUDA_LAUNCH_BLOCKING=1.
Environment:
- Python: 3.10.20
- PaddlePaddle: 2.6.1
- GPU: NVIDIA GeForce RTX 3090
- NVIDIA driver: 595.58.03
- Paddle CUDA runtime: 11.7
bug描述 Describe the Bug
paddle.sortandpaddle.argsortfail on GPU when the input tensor has a zero-sized dimension, even when the sorted axis itself has a positive size. On CPU,paddle.sorthandles the same empty tensor shape and returns an empty output tensor.Minimal reproducing example:
Expected result:
Actual result:
For reference,
paddle.sortsucceeds on CPU for the same shape:TensorFlow also handles the corresponding empty
argsortcase on GPU:其他补充信息 Additional Supplementary Information
Reproduced in fresh Python processes with
CUDA_LAUNCH_BLOCKING=1.Environment: