install clip
python train.py --dataset cifar10 --gpuid 0 --epochs 200 --lr 0.1 --network resnet18 --batch-size 256 --task-name all-data --base-dir ./data-model/cifar10python generate_importance_score.py --gpuid 0 --base-dir ./data-model/cifar10 --task-name all-data- Extract features of the dataset using CLIP model. We present pre-extracted features of CIFAR10/100 in
/extracted_feature.
cd Structure_Entropy
python extract_feature.py- Build a knn graph based on the feature. We recommend
$k=\log n$ as an initial setting before searching the best$k$ . For CIFAR10/100,$\log n$ is approximately$15$ .
python build_graph.py --knn 15- Structure entropy calculation and merge it into previous score file.
python generate_SE_score.py --knn 15 python train.py --base-dir ./data-model/cifar10 --dataset cifar10 --gpuid 0 --epochs 200 --coreset --coreset-mode SE_bns --coreset-ratio 0.1 --mis-ratio 0.35 --knn 15 --gamma 1.1 --data-score-path ./data-model/cifar10/all-data/cifar10-data-score-all-15NN-data.pickleThanks to the authors of Coverage-centric Coreset Selection for High Pruning Rates and D2 Pruning: Message Passing for Balancing Diversity & Difficulty in Data Pruning for releasing their code for evaluating CCS/D2 and training ResNet models on CIFAR10, CIFAR100, ImageNet-1K. Much of this codebase has been adapted from their codes.