Skip to content
/ RALA Public

[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention

Notifications You must be signed in to change notification settings

qhfan/RALA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[CVPR2025]Breaking the Low-Rank Dilemma of Linear Attention

Implementation of "Breaking the Low-Rank Dilemma of Linear Attention"

The Softmax attention mechanism in Transformer models is notoriously computationally expensive, particularly due to its quadratic complexity, posing significant challenges in vision applications. In contrast, linear attention provides a far more efficient solution by reducing the complexity to linear levels. However, compared to Softmax attention, linear attention often experiences significant performance degradation. Our experiments indicate that this performance drop is due to the low-rank nature of linear attention's feature map, which hinders its ability to adequately model complex spatial information. In this paper, to break the low-rank dilemma of linear attention, we conduct rank analysis from two perspectives: the KV buffer and the output features. Consequently, we introduce Rank-Augmented Linear Attention (RALA), which rivals the performance of Softmax attention while maintaining linear complexity and high efficiency. Based on RALA, we construct the Rank-Augmented Vision Linear Transformer (RAVLT). Extensive experiments demonstrate that RAVLT achieves excellent performance across various vision tasks. Specifically, without using any additional labels, data, or supervision during training, RAVLT achieves an 84.4% Top-1 accuracy on ImageNet-1k with only 26M parameters and 4.6G FLOPs. This result significantly surpasses previous linear attention mechanisms, fully illustrating the potential of RALA.

Citation

@inproceedings{fan2024breaking,
      title={Breaking the Low-Rank Dilemma of Linear Attention},
      author={Qihang Fan and Huaibo Huang and Ran He },
      year={2025},
      booktitle={CVPR},
}

Image Classification

Model Params(M) FLOPs(G) ckpt
RAVLT-T 15 2.4 RAVLT-T
RAVLT-S 26 4.6 RAVLT-S
RAVLT-B 48 9.9 RAVLT-B
RAVLT-L 95 16.0 RAVLT-L

Object Detection and Instance Segmentation

Retinanet 1x

Backbone Params(M) FLOPs(G) APb APb50 APb75 APbS APbM APbL ckpt
RAVLT-T 24 201 45.9 67.4 49.4 28.5 50.1 60.8 RAVLT-T
RAVLT-S 34 244 48.3 69.8 52.1 32.7 52.8 63.6 RAVLT-S
RAVLT-B 57 353 49.8 71.2 54.0 34.0 54.3 64.9 RAVLT-B
RAVLT-L 104 482 50.9 72.2 55.0 34.7 55.7 65.4 RAVLT-L

Mask R-CNN 1x

Backbone Params(M) FLOPs(G) APb APb50 APb75 APm APm50 APm75 ckpt
RAVLT-T 33 219 47.3 69.1 51.9 42.7 66.2 46.0 RAVLT-T
RAVLT-S 44 262 49.8 71.3 54.5 44.6 68.5 48.2 RAVLT-S
RAVLT-B 67 372 51.2 72.7 56.4 45.7 69.9 49.5 RAVLT-B
RAVLT-L 114 501 52.3 73.8 57.3 46.4 71.1 50.4 RAVLT-L

Mask R-CNN 3x

Backbone Params(M) FLOPs(G) APb APb50 APb75 APm APm50 APm75 ckpt
RAVLT-S 44 262 51.4 72.3 56.5 45.5 69.7 48.8 RAVLT-S
RAVLT-B 67 372 52.7 73.5 57.7 46.4 70.6 50.2 RAVLT-B
RAVLT-L 114 501 53.6 74.4 58.9 47.3 71.6 51.2 RAVLT-L

Cascade Mask R-CNN 3x

Backbone Params(M) FLOPs(G) APb APb50 APb75 APm APm50 APm75 ckpt
RAVLT-S 82 741 54.2 72.9 58.7 46.8 70.5 50.9 RAVLT-S
RAVLT-B 105 851 55.3 73.8 60.1 47.7 71.4 52.1 RAVLT-B
RAVLT-L 152 979 55.6 74.1 60.5 48.0 71.8 52.3 RAVLT-L

Semantic Segmentation

Semantic FPN 1x

Backbone Params(M) FLOPs(G) mIoU(%) ckpt
RAVLT-T 18 136 47.9 RAVLT-T
RAVLT-S 28 180 49.5 RAVLT-S
RAVLT-B 51 292 51.9 RAVLT-B
RAVLT-L 98 424 52.6 RAVLT-L

UperNet 2x

Backbone Params(M) FLOPs(G) mIoU(%) ckpt
RAVLT-S 55 937 50.7 RAVLT-S
RAVLT-B 77 1050 52.5 RAVLT-B
RAVLT-L 125 1182 53.2 RAVLT-L

About

[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published