The official implementation of Task Residual for Tuning Vision-Language Models (accepted to CVPR 2023).
The proposed Task Residual Tuning (TaskRes) is a new paradigm for tuning vision-language models (VLMs), which directly tunes the text-based classifier weights, without the need of heavy text encoders for prompt updates or carefully designed adapters.
- Prompt tuning tunes the input of the models;
- Adapters transform the pre-trained features by an MLP $\phi_{\omega}$ :$\mathbf{f}'=\mathbf{f} + \alpha \phi_{\omega}(\mathbf{f})$ or$\mathbf{t}'=\mathbf{t} + \alpha \phi_{\omega}(\mathbf{t})$ ;
- TaskRes (Ours) directly tunes the text-based classifier weights in an additive way: $\mathbf{t}'=\mathbf{t}+\alpha\mathbf{x}$ where$\mathbf{x}$ is a set of learnable parameters.
This repository requires to install the environment and datasets:
- follow here to install Dassl.pytorch and PyTorch.
- run pip install -r requirements.txtunderTaskRes/to install a few more packages required by CLIP (this should be done whendasslis activated).
- follow DATASETS.md to install the datasets.
PS: You can also follow CoOp to perform the installation.
We present the basic usage here.
(a) Train regular TaskRes:
- see train_regular.sh to run regular TaskRes (i.e., using regular base).
(b) Train enhanced TaskRes:
- download enhanced bases and move the folder strong_basetoTaskRes/.
- see train_enhance.sh to run enhanced TaskRes (i.e., using enhanced base).
(c) Test domain generalization:
- see test_dg.sh to run enhanced TaskRes (i.e., using enhanced base).
PS: Refer to CoOp for more usage.
This repository is mainly based on Kaiyang Zhou's repository CoOp code base. We sincerely thank Kaiyang for his awesome code base.
If you find this work useful for your research, please cite us:
@inproceedings{yu2023task,
  title={Task Residual for Tuning Vision-Language Models},
  author={Yu, Tao and Lu, Zhihe and Jin, Xin and Chen, Zhibo and Wang, Xinchao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={10899--10909},
  year={2023}
}
