BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition

This repository provides the official implementation of the paper:

BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition Fei Long*, Xiaoou Li*, Jiaming Lv*, Haoyuan Yang, Xianjun Cheng, Peihua Li. ICML 2025 (Poster)

📌 Introduction

In this paper, we propose BDC-CLIP for action recognition, a novel framework that leverages Brownian Distance Covariance (BDC) to enhance video-language alignment. Unlike prior methods that rely on cosine similarity between global tokens-limiting their ability to capture complex dependencies-BDC-CLIP models both linear and nonlinear relationships across all visual and textual tokens. This enables fine-grained alignment in space, time, and language, which is crucial for understanding dynamic human actions. BDC-CLIP achieves state-of-the-art performance in zero-shot, few-shot, base-to-novel, and fully supervised settings, demonstrating its effectiveness and generality.

✨ Highlights

New metric for multimodal alignment: Introduces BDC as an alternative to cosine similarity, capturing complex statistical dependencies for more fine-grained alignment.
Temporal BDC Attention: Propose a temporal BDC attention that captures patch-wise importance and temporal dynamics.
Strong generalization capability: Achieves state-of-the-art results in zero-shot, few-shot, base-to-novel, and fully-supervised video recognition.

🏗️ Project Structure

BDC_CLIP/
├── assets/             # svg figures for README.md
├── clip/               # CLIP model implementation  
├── configs/            # Configuration files for different experimental settings  
├── datasets/           # Dataset loaders and preprocessing pipeline  
├── datasets_splits/    # Dataset splits for various evaluation protocols  
├── labels/             # Label lists for each dataset under different settings  
├── prompts/            # Prompts generated by LLMs for each dataset  
├── scripts/            # Training and evaluation scripts  
├── trainers/           # Core training modules implementing our methods  
├── utils/              # Utility functions and helper tools  
├── main.py             # Entry point for training and evaluation  
├── README.md           # Project documentation  
└── requirements.txt    # Python dependencies

🚀 Getting Started

🪜 Environment Setup

conda create -n bdc_clip python=3.10
conda activate bdc_clip
pip install -r requirements.txt

💻 Our experiments were conducted using two NVIDIA RTX 4090 GPUs. ⚠️ We strongly recommend following our environment setup! Note that differences in hardware (e.g., GPU type) or software (e.g., Pytorch version) may lead to inconsistent results.

🪜 Dataset Preparation

Supported datasets:

* Kinetics-400
* Kinetics-600
* UCF-101
* HMDB-51
* Something-Something V2

Please follow the instructions provide by TC_CLIP for data preparation.

🪜 Training & Evaluation

# Modefied the daset path in configs

# Pretraining on Kinetics-400 (zero-shot)
cd scripts/pre_training
bash zero_shot_pretrain.sh

# Zero-shot evaluation
cd scripts/zero_shot
bash zero_shot_hmdb51.sh

📊 Results

💥 Zero-shot Classification

Setting	HMDB-51	UCF-101	K600	Model
w/o WSE	59.4	85.9	76.5	OneDrive Baiduyun
w WSE	60.7	86.9	78.9	OneDrive Baiduyun

💥 Few-shot Classification (Pretrained on Kinetics-400)

Dataset	2-shot	4-shot	8-shot	16-shot	Pretrained Model	Finetuned Model
HMDB-51	68.0	71.5	75.9	77.3	OneDrive Baiduyun	OneDrive Baiduyun
UCF-101	94.9	96.7	97.5	98.5		OneDrive Baiduyun
SSv2	9.9	11.6	14.4	19.5		OneDrive Baiduyun

💥 Base-to-novel Classification (Pretrained on Kinetics-400)

Dataset	Base	Novel	HM	Pretrained Model	Finetuned Model
HMDB-51	81.0	61.3	69.8	OneDrive Baiduyun	OneDrive Baiduyun
UCF-101	97.5	88.0	92.5		OneDrive Baiduyun
SSv2	20.9	18.2	19.5		OneDrive Baiduyun

📋 Citation

If you find this work useful, please consider citing:

@inproceedings{long2025bdcclip,
  title     = {BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition},
  author    = {Fei Long and Xiaoou Li and Jiaming Lv and Haoyuan Yang and Xianjun Cheng and Peihua Li},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2025}
}

🌟 Acknowledgments

Our project is partially based on the open-source projects ViFiCLIP, TC_CLIP, and FROSTER. We sincerely acknowledge their contributions.

📬 Contact

If you have any questions or suggestions, feel free to contact:
Fei Long: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition

This repository provides the official implementation of the paper:

📌 Introduction

✨ Highlights

🏗️ Project Structure

🚀 Getting Started

🪜 Environment Setup

🪜 Dataset Preparation

🪜 Training & Evaluation

📊 Results

💥 Zero-shot Classification

💥 Few-shot Classification (Pretrained on Kinetics-400)

💥 Base-to-novel Classification (Pretrained on Kinetics-400)

📋 Citation

🌟 Acknowledgments

📬 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
clip		clip
configs		configs
datasets		datasets
datasets_splits		datasets_splits
labels		labels
prompts		prompts
scripts		scripts
trainers		trainers
utils		utils
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

Fei-Long121/BDC-CLIP

Folders and files

Latest commit

History

Repository files navigation

BDC-CLIP: Brownian Distance Covariance for Adapting CLIP to Action Recognition

This repository provides the official implementation of the paper:

📌 Introduction

✨ Highlights

🏗️ Project Structure

🚀 Getting Started

🪜 Environment Setup

🪜 Dataset Preparation

🪜 Training & Evaluation

📊 Results

💥 Zero-shot Classification

💥 Few-shot Classification (Pretrained on Kinetics-400)

💥 Base-to-novel Classification (Pretrained on Kinetics-400)

📋 Citation

🌟 Acknowledgments

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages