-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Feature Request) Model EMA #282
Comments
Are you fine-tuning for classification tasks or continuing to train on image-text pairs? In any case one other thing to try is linearly interpolating the weights before and after fine-tuning -- you may find this reduces catastrophic forgetting. i.e., if you have state dicts |
@mitchellnw thanks for your response. I'm training on image-text pairs. Thanks for the idea re. interpolation, I'll definitely give that a shot and report back my findings. |
The weight interpolation suggestion was super helpful. In the graph below, I'm yet to test on ImageNet, will be doing that next. |
Great to hear! In case your interested some more background on that trick here: https://arxiv.org/abs/2109.01903 |
In my fine-tuning experiments, I've run into catastrophic forgetting and was wondering if using EMA would help mitigate this.
I'm not sure if it makes sense to do this to the image encoder alone, or both the text + image encoder.
If it makes sense, I'd love to try implementing this with some guidance.
The text was updated successfully, but these errors were encountered: