Skip to content

PyTorch implementation of "BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition" (ICML 2025).

License

Notifications You must be signed in to change notification settings

Fei-Long121/BDC-CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition

PaperLicense

This repository provides the official implementation of the paper:

BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition Fei Long*, Xiaoou Li*, Jiaming Lv*, Haoyuan Yang, Xianjun Cheng, Peihua Li. ICML 2025 (Poster)

πŸ“Œ Introduction

In this paper, we propose BDC-CLIP for action recognition, a novel framework that leverages Brownian Distance Covariance (BDC) to enhance video-language alignment. Unlike prior methods that rely on cosine similarity between global tokens-limiting their ability to capture complex dependencies-BDC-CLIP models both linear and nonlinear relationships across all visual and textual tokens. This enables fine-grained alignment in space, time, and language, which is crucial for understanding dynamic human actions. BDC-CLIP achieves state-of-the-art performance in zero-shot, few-shot, base-to-novel, and fully supervised settings, demonstrating its effectiveness and generality.

✨ Highlights

  • New metric for multimodal alignment: Introduces BDC as an alternative to cosine similarity, capturing complex statistical dependencies for more fine-grained alignment.
  • Temporal BDC Attention: Propose a temporal BDC attention that captures patch-wise importance and temporal dynamics.
  • Strong generalization capability: Achieves state-of-the-art results in zero-shot, few-shot, base-to-novel, and fully-supervised video recognition.

πŸ—οΈ Project Structure

BDC_CLIP/
β”œβ”€β”€ assets/             # svg figures for README.md
β”œβ”€β”€ clip/               # CLIP model implementation  
β”œβ”€β”€ configs/            # Configuration files for different experimental settings  
β”œβ”€β”€ datasets/           # Dataset loaders and preprocessing pipeline  
β”œβ”€β”€ datasets_splits/    # Dataset splits for various evaluation protocols  
β”œβ”€β”€ labels/             # Label lists for each dataset under different settings  
β”œβ”€β”€ prompts/            # Prompts generated by LLMs for each dataset  
β”œβ”€β”€ scripts/            # Training and evaluation scripts  
β”œβ”€β”€ trainers/           # Core training modules implementing our methods  
β”œβ”€β”€ utils/              # Utility functions and helper tools  
β”œβ”€β”€ main.py             # Entry point for training and evaluation  
β”œβ”€β”€ README.md           # Project documentation  
└── requirements.txt    # Python dependencies

πŸš€ Getting Started

πŸͺœ Environment Setup

conda create -n bdc_clip python=3.10
conda activate bdc_clip
pip install -r requirements.txt

πŸ’» Our experiments were conducted using two NVIDIA RTX 4090 GPUs. ⚠️ We strongly recommend following our environment setup! Note that differences in hardware (e.g., GPU type) or software (e.g., Pytorch version) may lead to inconsistent results.

πŸͺœ Dataset Preparation

Supported datasets:

* Kinetics-400
* Kinetics-600
* UCF-101
* HMDB-51
* Something-Something V2

Please follow the instructions provide by TC_CLIP for data preparation.

πŸͺœ Training & Evaluation

# Modefied the daset path in configs

# Pretraining on Kinetics-400 (zero-shot)
cd scripts/pre_training
bash zero_shot_pretrain.sh

# Zero-shot evaluation
cd scripts/zero_shot
bash zero_shot_hmdb51.sh

πŸ“Š Results

πŸ’₯ Zero-shot Classification

Setting HMDB-51 UCF-101 K600 Model
w/o WSE 59.4 85.9 76.5 OneDrive
Baiduyun
w WSE 60.7 86.9 78.9

πŸ’₯ Few-shot Classification (Pretrained on Kinetics-400)

Dataset 2-shot 4-shot 8-shot 16-shot Pretrained Model Finetuned Model
HMDB-51 68.0 71.5 75.9 77.3 OneDrive
Baiduyun
UCF-101 94.9 96.7 97.5 98.5 OneDrive
Baiduyun
SSv2 9.9 11.6 14.4 19.5 OneDrive
Baiduyun

πŸ’₯ Base-to-novel Classification (Pretrained on Kinetics-400)

Dataset Base Novel HM Pretrained Model Finetuned Model
HMDB-51 81.0 61.3 69.8 OneDrive
Baiduyun
UCF-101 97.5 88.0 92.5 OneDrive
Baiduyun
SSv2 20.9 18.2 19.5 OneDrive
Baiduyun

πŸ“‹ Citation

If you find this work useful, please consider citing:

@inproceedings{long2025bdcclip,
  title     = {BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition},
  author    = {Fei Long and Xiaoou Li and Jiaming Lv and Haoyuan Yang and Xianjun Cheng and Peihua Li},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2025}
}

🌟 Acknowledgments

Our project is partially based on the open-source projects ViFiCLIP, TC_CLIP, and FROSTER. We sincerely acknowledge their contributions.

πŸ“¬ Contact

If you have any questions or suggestions, feel free to contact:
Fei Long: [email protected]

About

PyTorch implementation of "BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition" (ICML 2025).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published