Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategy B3 #12

Open
june6423 opened this issue Aug 6, 2024 · 5 comments
Open

Strategy B3 #12

june6423 opened this issue Aug 6, 2024 · 5 comments

Comments

@june6423
Copy link

june6423 commented Aug 6, 2024

Greetings!

I read your paper with great interest and am trying to reproduce some of your experiments.

I want to reproduce your vanilla KD setting using strategy B1, B2, B3 based on your DIST_KD paper.

I found B1 and B2 strategy on your strategies folder, but I couldn't find B3 setting.

configs/strategies/deit/deit_tiny.yaml appears to be B3, but I'm not sure, which leaves me with a question.

Could you give me B3 setting with vanilla KD with temperature 4?

@hunto
Copy link
Owner

hunto commented Aug 6, 2024

Hi @june6423 ,

Our B3 experiment on Swin Transformer was implemented on the original training code of Swin-Transformer, so there's no b3 config in this repo.

Alternatively, if you want to implement B3 on this repo, the strategy is similar to deit_tiny, you can use the following config for KD (T=4):

aa: rand-m9-mstd0.5
batch_size: 128 # x 8 gpus = 1024bs
color_jitter: 0.4
decay_by_epoch: false
decay_epochs: 3
decay_rate: 0.967
# dropout
drop: 0.0
drop_path_rate: 0.2

epochs: 300
log_interval: 50
lr: 1.e-3
min_lr: 5.0e-06
model_ema: False
model_ema_decay: 0.999
momentum: 0.9
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
clip_grad_norm: true
clip_grad_max_norm: 5.0

interpolation: 'bicubic'

# random erase
remode: pixel
reprob: 0.25

# mixup
mixup: 0.8
cutmix: 1.0
mixup_prob: 1.0
mixup_switch_prob: 0.5
mixup_mode: 'batch'

sched: cosine
seed: 42
warmup_epochs: 20
warmup_lr: 5.e-7
weight_decay: 0.04
workers: 16

# kd
kd: 'kd'
ori_loss_weight: 1.
kd_loss_weight: 1.
teacher_model: 'timm_swin_large_patch4_window7_224'
teacher_pretrained: True

@june6423
Copy link
Author

june6423 commented Aug 7, 2024

Thanks a lot!

Now I want to reproduce results of other KD methods including RKD and CRD (I am working on your Table5 in DIST_KD paper, CIFAR 100)

But I failed to find training config and code for training from scratch and other KD methods.

I am working on image_classification_sota with d9662f7 version.

I am wondering if there is already published code to experiment with these settings, or if I should implement them myself.

Thanks for your effort.

@june6423 june6423 closed this as completed Aug 7, 2024
@june6423 june6423 reopened this Aug 7, 2024
@Malaika68
Copy link

Hi @june6423 , how did you manage to get the data from meta folder for ImageNet?

@june6423
Copy link
Author

june6423 commented Feb 3, 2025

Hi @june6423 , how did you manage to get the data from meta folder for ImageNet?

I made it meta data file.

Make train.txt and val.txt in data/imagenet/meta folder.

Here's the example of train.txt. (File path and class number)

image/n01440764/n01440764_10026.JPEG 0
image/n01440764/n01440764_10027.JPEG 0

@Malaika68
Copy link

Malaika68 commented Feb 4, 2025

Hi @june6423, thanks for help. I did that but my validation accuracy is 0. Did you also face this thing? I am trying to distill the knowledge from resnet34 to resnet18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants