Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Feature Request) Model EMA #282

Open
rsomani95 opened this issue Dec 11, 2022 · 4 comments
Open

(Feature Request) Model EMA #282

rsomani95 opened this issue Dec 11, 2022 · 4 comments

Comments

@rsomani95
Copy link
Contributor

In my fine-tuning experiments, I've run into catastrophic forgetting and was wondering if using EMA would help mitigate this.
I'm not sure if it makes sense to do this to the image encoder alone, or both the text + image encoder.

If it makes sense, I'd love to try implementing this with some guidance.

@mitchellnw mitchellnw added the enhancement New feature or request label Dec 11, 2022
@mitchellnw
Copy link
Contributor

Are you fine-tuning for classification tasks or continuing to train on image-text pairs? In any case one other thing to try is linearly interpolating the weights before and after fine-tuning -- you may find this reduces catastrophic forgetting. i.e., if you have state dicts sd1 and sd2 try loading {k : (1 - alpha) * sd1[k] + alpha * sd2[k] for k in sd1.keys()} where alpha is some number between 0 and 1.

@mitchellnw mitchellnw added new feature and removed enhancement New feature or request labels Dec 11, 2022
@rsomani95
Copy link
Contributor Author

@mitchellnw thanks for your response. I'm training on image-text pairs. Thanks for the idea re. interpolation, I'll definitely give that a shot and report back my findings.

@rsomani95
Copy link
Contributor Author

The weight interpolation suggestion was super helpful.

In the graph below, alpha=0.0 is the pre-trained model and alpha=1.0 is the fully finetuned model. Turns out an alpha of 0.4 goes a long way. What's shown here are validation scores across 19 downstream datasets.

CleanShot 2022-12-13 at 19 07 20

I'm yet to test on ImageNet, will be doing that next.

@mitchellnw
Copy link
Contributor

Great to hear! In case your interested some more background on that trick here: https://arxiv.org/abs/2109.01903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants