Is your feature request related to a problem? Please describe.
I spend several days figuring out why my spatial transformer network using a warp layer did not properly train. In the end, the reason is that when using the nearest neighbor mode for the warp layer, the result is non-differentiable.
Pytorch does not give any warnings of this and it is also not mentioned in the documentation of grid_sample.
Describe the solution you'd like
An addition to the documentation of the warp layer that warns users of the non-differentiability of the nearest neighbor mode.
Describe alternatives you've considered
I do not really see any other alternatives, as this modification is so small.