ACL deconvolution crashes with large weights

**PyTorch reproducer:**
```
python -c "import torch;torch.nn.ConvTranspose1d(2016, 1026, 1024, stride=256)(torch.rand(1, 2016, 224))"
```

 **Standalone oneDNN (v3.7.1) with ACL (v25.02) reproducer (with `benchdnn)`:**
```
ONEDNN_VERBOSE=profile,profile_externals ./tests/benchdnn/benchdnn --deconv mb1_ic2016oc1026_ih1oh1kh1sh1dh0ph0_iw224ow58112kw1024sw256dw0pw0
```

**Root cause:** 

The root cause in ACL is a write to an invalid address on [this](https://github.com/ARM-software/ComputeLibrary/blob/main/src/core/NEON/kernels/NEReverseKernel.cpp#L177) line

The address we write to is calculated using [this ptr_to_element](https://github.com/ARM-software/ComputeLibrary/blob/main/arm_compute/core/ITensor.h#L69) function which calculates the offset in [offset_element_in_bytes](https://github.com/ARM-software/ComputeLibrary/blob/main/src/core/TensorInfo.cpp#L432). The offset in `offset_element_in_bytes` is `int32_t` which overflows in this case because of the massive number of parameters in the problem `2016 * 1026 * 1024` (i.e. returning an offset of `-2139758464`).

A smaller workload like `torch.nn.ConvTranspose1d(500, 1026, 1024, stride=256)(torch.rand(1, 500, 224))` doesn't crash.

**Suggested fixes:**
- use 64 bits for the offset (and the return type) in `offset_element_in_bytes` 
- OR make ACL fail the validation stage if the problem is too big.


**Full Stack Trace**: See https://github.com/pytorch/pytorch/issues/165654


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ACL deconvolution crashes with large weights #1193

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ACL deconvolution crashes with large weights #1193

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions