Skip to content

thu-vis/SE-based_sample_selection

Repository files navigation

Structure-Entropy-Based Sample Selection for Efficient and Effective Learning

Setup

install clip

Image Classification

Train the model on the entire dataset

python train.py --dataset cifar10 --gpuid 0 --epochs 200 --lr 0.1 --network resnet18 --batch-size 256 --task-name all-data --base-dir ./data-model/cifar10

Calculate importance score

python generate_importance_score.py --gpuid 0 --base-dir ./data-model/cifar10 --task-name all-data

Structure Entropy calculation

  1. Extract features of the dataset using CLIP model. We present pre-extracted features of CIFAR10/100 in /extracted_feature.
cd Structure_Entropy
python extract_feature.py
  1. Build a knn graph based on the feature. We recommend $k=\log n$ as an initial setting before searching the best $k$. For CIFAR10/100, $\log n$ is approximately $15$.
python build_graph.py --knn 15
  1. Structure entropy calculation and merge it into previous score file.
python generate_SE_score.py --knn 15 

Train the model with Structure-Entropy-Based Sample Selection

python train.py --base-dir ./data-model/cifar10 --dataset cifar10 --gpuid 0 --epochs 200  --coreset --coreset-mode SE_bns --coreset-ratio 0.1 --mis-ratio 0.35 --knn 15 --gamma 1.1 --data-score-path ./data-model/cifar10/all-data/cifar10-data-score-all-15NN-data.pickle

Acknowledgements

Thanks to the authors of Coverage-centric Coreset Selection for High Pruning Rates and D2 Pruning: Message Passing for Balancing Diversity & Difficulty in Data Pruning for releasing their code for evaluating CCS/D2 and training ResNet models on CIFAR10, CIFAR100, ImageNet-1K. Much of this codebase has been adapted from their codes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages