- If you train without Tensor Core (i.e. FP32 training), set all
algo
in convolution/maxpool toConvAlgo.Native
manually. Default Algorithm isConvAlgo.MaskImplicitGemm
, which is SLOWER thanConvAlgo.Native
when use float32. this will be fixed in spconv 2.2. - If your GPU support Tensor Core, use FP16 (mixed precision training) if possible.
- If you train with mixed precision training (use Tensor Core), you don't need to set algorithm manually.
- Currently fast algorithm only support kernel volume (prod of kernel size) <= 32, so don't use large kernel size.
- make sure your channel size is multiple of 8 when using fp16. multiple of 32 is better.
- spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
See benchmark for more performance details of different algorithms.