-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text Tower Refactor #185
Text Tower Refactor #185
Conversation
would be nice to finish this sometime @iejMac @rwightman so we can have some multilingual clip in the future |
code looks good to me what do we want to validate here? Maybe something like:
Anything else ? |
ideally we could validate that with unit tests, but we could also do one shot testing |
@rom1504 yeah, think those tests are reasonable, I already checked some pretrained zero-shot runs. Maybe an epoch of train on a smaller model we have some recent results for... ? |
loading
loading using the factory funcion LAION weights still load fine, and inference on text and image is the same as on the main branch. (EDIT) tested with vit-l-14 laion2b_s32b_b82k and random noise as image input and random strings 1-5 words each 1-10 symbols. |
@lopho if you're interested in writing more automated tests, it would definitely be appreciated |
I can write tests, but at the soonest at the end of the week / weekend. |
…tom text model w/ existing models (for testing).
I fixed some issues and pulled in some other changes so that the testing is done once
|
Do you think it's ready now Ross ? |
Code looks fine to me Ideally we would have unit test (for inference and training) or at least check things once But also I think it's really safe at this point, we could probably decide to just merge to avoid blocking things further |
Yeah, sorry I haven't found the time yet to write tests. |
@rom1504 I've tested quite a number of zero-shot eval models w/ different precision, torchscript, custom tower enabled/disabled, and did pure bf16 train locally on 1 GPU for 1 cc12m epoch. Need to test the ResNet and timm models to see if they still work (eval for resnet, train an epoch for timm) Should test training on a cluster w/ proper dataset, maybe few epochs of a 'B' model, not sure which cluster should use / if we have anything in a good state :/ @JeniaJitsev ? |
@rwightman @rom1504 JUWELS Booster can handle runs with 32 nodes / 128 GPUs so far very robustly, no problems observed so far at all. So for testing models of 'B' scale, we are surely fine to go. |
…ale. No bias on timm model projection by default. Fix null pretrained arg handling.
9093d5e
to
90a890f
Compare
I think it's not a good idea to keep pushing more unrelated changes here. Let's not add anything more and first validate and merge ? |
#198 created that issue to track adding automated regression tests |
I understand the sentiment, but with limited man power and treat coverage, and demands of verifying training at scale, easy for a feature PR to end up a release branch as I want to ensure everything is tested that's relevant to near term planned runs. Will keep new commits to bug fix. |
Ok, let's try to validate it asap and merge it so that things get easier. |
…aming/tense to match other args.
…onfigs for updated table.
I think this is ready to merge, @mitchellnw any concern re your JUWELS run? convnext is running but hasn't been stable, seems like it might be with grad clipping enabled (which is a default for any supervised training of these models) |
@mitchellnw without a special exception, 24h is the per runtime limit anyways... |
k, I'm going to merge this, definitely possibility of some regressions but best to get this structure in main and improve/fix from there. I will hold off on pushing a pypi release until this is on main for a bit with no showstopping wtfs popping up |
all good on the b/32 run, reached 63.67% |
A refactor of #178 that will keep backwards compat for existing weights/models but provide a different base model for new models using custom text towers...