https://arxiv.org/abs/2105.12723
Aggregating Nested Transformers (Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Tomas Pfister)
특별한 overlapping 메커니즘 없이 non overlapping local attention + pooling으로 비죤 트랜스포머를 구성. swin transformer의 shifting을 없앤 ablation과 너무 비슷하지 않나 싶긴 한데...
[[210323 Scaling Local Self-Attention for Parameter Efficient Visual Backbones]] [[210325 Swin Transformer]]
#vit #local_attention