Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text-text (audio) CLIP #504

Closed
wants to merge 40 commits into from
Closed

Add text-text (audio) CLIP #504

wants to merge 40 commits into from

Conversation

marianna13
Copy link

@marianna13 marianna13 commented Apr 22, 2023

Text-Text (audio) CLIP (based on #323 but one of the text tokens are actually audio encoded with Encodec (can add other encoding methods). One of the models is just BERT (for real text) and the other one is LSG BERT.

@rwightman
Copy link
Collaborator

Oyo, so, where does this sit relative to #323? Is the intent to have that one cleaned up and merged and then this one will merge those changes before being merged?

@marianna13 marianna13 changed the title Text text audio Add text-text (audio) CLIP Apr 22, 2023
@marianna13
Copy link
Author

Hi @rwightman ! I added description, basically we pretend that encoded audio is just text tokens and treat it respectively. I trained it on a small subset of LAION630k (basically VGGSound) for just a little and it gave me not so bad results. I'm just wondering is it even worth it to have such a model and what other people think of it :)

@marianna13 marianna13 closed this by deleting the head repository Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants