Add text-text (audio) CLIP #504

marianna13 · 2023-04-22T01:01:09Z

Text-Text (audio) CLIP (based on #323 but one of the text tokens are actually audio encoded with Encodec (can add other encoding methods). One of the models is just BERT (for real text) and the other one is LSG BERT.

rwightman · 2023-04-22T04:08:43Z

Oyo, so, where does this sit relative to #323? Is the intent to have that one cleaned up and merged and then this one will merge those changes before being merged?

marianna13 · 2023-04-22T14:12:19Z

Hi @rwightman ! I added description, basically we pretend that encoded audio is just text tokens and treat it respectively. I trained it on a small subset of LAION630k (basically VGGSound) for just a little and it gave me not so bad results. I'm just wondering is it even worth it to have such a model and what other people think of it :)

lukeum and others added 30 commits December 20, 2022 21:59

add texttext-clip

3367f27

fix loss

5089d57

change main.py

d414da5

add arguments

949834e

add factory.py test

479aa07

test main.py

8fcc3aa

Merge branch 'main' into main

db81bbb

fix main.py

e68f4f8

Merge branch 'main' of https://github.com/lingjzhu/open_clip

fef6721

rename variables

f6eba4f

rename variables

5d331d9

Merge branch 'mlfoundations:main' into main

9d620f5

add hf datasets

4a00ea0

fix Siamese network

588e8ba

fix some typos

1e0d6aa

fix typos

516176c

Merge branch 'main' into main

2241ce9

resolve conflicts

71d46ea

resolve conflicts

73ab4d7

resolve conflicts

d210534

resolve conflicts

26d677b

resolve conflicts

a3029f4

resolve conflicts in loss.py

633f53f

resolve conflicts in loss.py

634709e

add output_dict

a9710b1

add webdataset loader

4084147

Merge branch 'main' into main

d430974

Update loss.py

9167976

add sts evaluation code

579a591

fix dependencies

646d517

lingjzhu and others added 9 commits April 2, 2023 14:46

add weighted mean pooling for decoder models

9e5ed73

fix tokenizers

2ede8fc

enable freezing all weights but biases

5415590

fixed a typo

4f89a44

add tta config

4c3b5ea

add tta model

e344d7e

add tta training

d045f71

add tta example

adeb38b

fix hf_config

5ed0ad1

marianna13 changed the title ~~Text text audio~~ Add text-text (audio) CLIP Apr 22, 2023

data loaders can load tensors directly

082844c

marianna13 closed this by deleting the head repository Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text-text (audio) CLIP #504

Add text-text (audio) CLIP #504

marianna13 commented Apr 22, 2023 •

edited

Loading

rwightman commented Apr 22, 2023

marianna13 commented Apr 22, 2023

Add text-text (audio) CLIP #504

Add text-text (audio) CLIP #504

Conversation

marianna13 commented Apr 22, 2023 • edited Loading

rwightman commented Apr 22, 2023

marianna13 commented Apr 22, 2023

marianna13 commented Apr 22, 2023 •

edited

Loading