In FBNet V3, the padding of downsampling layer is set to [1, 1] for kernel_size 5 and [0, 0] for kernel_size 3, resulting in smaller resolution in later stages (i.e. 13x13 and 6x6 in the last two stages). This is not a common setting in other network, including MBNet and previous version of FBNet.
Why do you make such adjustment in FBNet V3? How does it affect the accuracy? Thanks in advance.