-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text-text (audio) CLIP #504
Conversation
Oyo, so, where does this sit relative to #323? Is the intent to have that one cleaned up and merged and then this one will merge those changes before being merged? |
Hi @rwightman ! I added description, basically we pretend that encoded audio is just text tokens and treat it respectively. I trained it on a small subset of LAION630k (basically VGGSound) for just a little and it gave me not so bad results. I'm just wondering is it even worth it to have such a model and what other people think of it :) |
Text-Text (audio) CLIP (based on #323 but one of the text tokens are actually audio encoded with Encodec (can add other encoding methods). One of the models is just BERT (for real text) and the other one is LSG BERT.