Spconv 2.x Performance Guide

Short Guide

If you train without Tensor Core (i.e. FP32 training), set all algo in convolution/maxpool to ConvAlgo.Native manually. Default Algorithm is ConvAlgo.MaskImplicitGemm, which is SLOWER than ConvAlgo.Native when use float32. this will be fixed in spconv 2.2.
If your GPU support Tensor Core, use FP16 (mixed precision training) if possible.
If you train with mixed precision training (use Tensor Core), you don't need to set algorithm manually.
Currently fast algorithm only support kernel volume (prod of kernel size) <= 32, so don't use large kernel size.
make sure your channel size is multiple of 8 when using fp16. multiple of 32 is better.
spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.

See benchmark for more performance details of different algorithms.