Add TextTextCLIP #323

lingjzhu · 2022-12-27T22:09:59Z

This pull request adds TextTextCLIP (CLIP-like text-to-text contrastive retrieval model) to the main branch.
It is still a work in progress.

Tasks

Add a config file for TextTextCLIP
Add TextTextCLIP in model.py
Modify factory.py to load model
Modify data.py to load text data
Modify `main.py' to train TextTextCLIP
Test loading TextTextCLIP
Test loading text-pair data.
Test dummy training
Rename variables

rom1504 · 2022-12-27T22:36:27Z

src/training/params.py

@@ -326,13 +326,54 @@ def parse_args(args):
        action='store_true',
        help="Freeze BatchNorm running stats in image tower for any locked layers.",
    )
+    parser.add_argument(
+        "--lock-doc",


what about a "lock-tower-1" and "lock-tower-2" param instead ?
query/doc is not the only thing

Sure. I will rename these variables immediately. Do you think we should also rename variables like image_features in train.py and main.py'? Will renaming variables cause confusion for the original text-image model? Currently train.py` is fully compatible with TextTextCLIP. No modification other than renaming variables is needed.

I have tested TextTextCLIP locally and it seems to work smoothly. But the script fails some tests here. I will make adjustments.

rom1504 · 2023-01-06T00:31:19Z

src/training/params.py

@@ -326,13 +338,54 @@ def parse_args(args):
        action='store_true',
        help="Freeze BatchNorm running stats in image tower for any locked layers.",
    )
+    parser.add_argument(


wonder if this can be reconciled with the normal locking params

rom1504 · 2023-01-06T00:32:54Z

src/training/data.py

+class TextPairDataset(Dataset):
+    def __init__(self, input_filename, text_a_key, text_b_key, tokenizer=None):
+        logging.debug(f'Loading parquet data from {input_filename}.')
+        df = pd.read_parquet(input_filename)


this is unlikely to scale

Right. I will update this. Do you have any suggestions? Do you think we should use the pyarrow package to load .parquet files?

rom1504 · 2023-01-06T00:33:52Z

src/open_clip/model.py

@@ -248,6 +248,47 @@ def forward(self, image, text):
        return image_features, text_features, self.logit_scale.exp()


+
+class TextTextCLIP(nn.Module):


this is completely duplicated from above class, I wonder if we could reconcile it

Maybe we could use this one as the general model? It's not specific about modality. We can refer to image_features as features_a and text_features as features_b.

rom1504 · 2023-01-06T00:34:22Z

src/open_clip/factory.py

-        mean=image_mean,
-        std=image_std
-    )
+    if not text_to_text:


what about checking if model.visual exists instead ?

If we merge TextTextCLIP to CustomCLIP to get a more general model, then model.visual might not exist at all?

rom1504 · 2023-01-06T00:34:40Z

src/open_clip/factory.py

-        mean=image_mean,
-        std=image_std
-    )
+    if not text_to_text:


what about checking if model.visual exists instead ?

rom1504 · 2023-01-06T00:34:46Z

src/open_clip/factory.py

@@ -179,9 +193,10 @@ def create_model(
        if precision in ("fp16", "bf16"):
            convert_weights_to_lp(model, dtype=torch.bfloat16 if precision == 'bf16' else torch.float16)

+        if not text_to_text:


what about checking if model.visual exists instead ?

rom1504 · 2023-01-30T23:25:22Z

Can you rebase on master ?

lingjzhu · 2023-01-31T04:00:16Z

Can you rebase on master ?

Done merging but still working on some minor inconsistencies that need to be fixed.
I've been setting up training in the computing cluster and fixing typos here and there. The code is mostly OK now. It should be ready once training has been successfully completed. I will post wandb logs for model training soon.

lingjzhu · 2023-03-23T05:41:53Z

@rom1504 Here is the latest code for TextTextCLIP. It is tested on the stability cluster and the evaluation code is also included. I have added an example script at docs/script_examples/text_example.sh. Let me know if it is OK to merge or more revisions are needed. Thank you!

rom1504 · 2023-04-16T18:53:59Z

This would really benefits from reviews @rwightman @mitchellnw @iejMac
It was trained quite decently
Getting it in a good enough state and merged would go a long way to expend the applicability space of openclip

rwightman · 2023-04-16T19:04:42Z

@rom1504 k, will try and look at it soon

daniel-z-kaplan · 2023-04-18T19:05:30Z

src/open_clip/loss.py

+        return logits_per_feature_a, logits_per_feature_b
+
+
+    def forward(self, image_features=None, text_features=None, logit_scale=None, text_a_features=None, text_b_features=None, output_dict=False):



This section feels a little awkward, maybe it should just always take features_a and features_b, not optional, like the prior code?
Same with logit_scale, why is that optional now?

hmm, yeah, wondering for clarity if it'd make more sense to have a specific text-text (CLLP?) loss with appropriate naming, possibly share the gather w/ generic names but multiple sets of args w/ different naming schemes and erasing the task specific names (that aid a bit with comprehension) seems less desirable than a bit of duplication...

Thank you for your comments! This is an artifact of trying to reconcile the original CLIP and the text model. But I agree with you that this is a bit awkward. I could add a loss wrapper for the text model. What do you think?

By the way, do you have any ideas as to the naming of the text model? TextTextCLIP sounds a bit redundant to me. How about CTTP? CLLP? LATTE (contrastive LAnguage-To-TExt pretraining)? CTP (contrastive text pretraining)? Do you have a better suggestions?

rwightman · 2023-04-18T19:09:34Z

overall things look pretty good, I'm trying to get over a mental block re the loss naming, I realize why the feature_a/b changes were made to the loss but I feel it harms ease of understanding re the most common use case, especially for newcomers to the core loss fn... hmm hnm

lukeum added 5 commits December 20, 2022 21:59

add texttext-clip

3367f27

fix loss

5089d57

change main.py

d414da5

add arguments

949834e

add factory.py test

479aa07

rom1504 reviewed Dec 27, 2022

View reviewed changes

lukeum and others added 6 commits January 3, 2023 21:52

test main.py

8fcc3aa

Merge branch 'main' into main

db81bbb

fix main.py

e68f4f8

Merge branch 'main' of https://github.com/lingjzhu/open_clip

fef6721

rename variables

f6eba4f

rename variables

5d331d9

lingjzhu marked this pull request as ready for review January 5, 2023 03:43

rom1504 reviewed Jan 6, 2023

View reviewed changes

lingjzhu and others added 3 commits January 8, 2023 22:34

Merge branch 'mlfoundations:main' into main

9d620f5

add hf datasets

4a00ea0

fix Siamese network

588e8ba

rom1504 mentioned this pull request Jan 26, 2023

New feature: text/text contrastive #304

Open

lingjzhu and others added 2 commits January 30, 2023 15:42

fix some typos

1e0d6aa

fix typos

516176c

Merge branch 'main' into main

2241ce9

lingjzhu added 2 commits January 30, 2023 22:11

resolve conflicts

71d46ea

resolve conflicts

73ab4d7

lingjzhu and others added 11 commits January 30, 2023 23:53

resolve conflicts

d210534

resolve conflicts

26d677b

resolve conflicts

a3029f4

resolve conflicts in loss.py

633f53f

resolve conflicts in loss.py

634709e

add output_dict

a9710b1

add webdataset loader

4084147

Merge branch 'main' into main

d430974

Update loss.py

9167976

add sts evaluation code

579a591

fix dependencies

646d517

lingjzhu added 4 commits April 2, 2023 14:46

add weighted mean pooling for decoder models

9e5ed73

fix tokenizers

2ede8fc

enable freezing all weights but biases

5415590

fixed a typo

4f89a44

daniel-z-kaplan reviewed Apr 18, 2023

View reviewed changes

rwightman mentioned this pull request Apr 22, 2023

Add text-text (audio) CLIP #504

Closed

lingjzhu and others added 7 commits April 30, 2023 16:05

add contriever training

3dddc3f

fix import

ab6c0b5

fix import

edef673

fix agumentation script

a495a8a

add MTEB evaluation

f7c7f19

MTEB benchmark

330d4d9

add script example

0fa9e22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TextTextCLIP #323

Add TextTextCLIP #323

lingjzhu commented Dec 27, 2022 •

edited

Loading

rom1504 Dec 27, 2022

lingjzhu Jan 4, 2023

lingjzhu Jan 4, 2023

rom1504 Jan 6, 2023

rom1504 Jan 6, 2023

lingjzhu Jan 7, 2023 •

edited

Loading

rom1504 Jan 6, 2023

lingjzhu Jan 7, 2023

rom1504 Jan 6, 2023

lingjzhu Jan 7, 2023

rom1504 Jan 6, 2023

rom1504 Jan 6, 2023

rom1504 commented Jan 30, 2023

lingjzhu commented Jan 31, 2023 •

edited

Loading

lingjzhu commented Mar 23, 2023

rom1504 commented Apr 16, 2023

rwightman commented Apr 16, 2023

daniel-z-kaplan Apr 18, 2023

rwightman Apr 18, 2023

lingjzhu Apr 19, 2023

rwightman commented Apr 18, 2023 •

edited

Loading

		@@ -248,6 +248,47 @@ def forward(self, image, text):
		return image_features, text_features, self.logit_scale.exp()



		class TextTextCLIP(nn.Module):

		return logits_per_feature_a, logits_per_feature_b


		def forward(self, image_features=None, text_features=None, logit_scale=None, text_a_features=None, text_b_features=None, output_dict=False):

Add TextTextCLIP #323

Are you sure you want to change the base?

Add TextTextCLIP #323

Conversation

lingjzhu commented Dec 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lingjzhu Jan 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rom1504 commented Jan 30, 2023

lingjzhu commented Jan 31, 2023 • edited Loading

lingjzhu commented Mar 23, 2023

rom1504 commented Apr 16, 2023

rwightman commented Apr 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwightman commented Apr 18, 2023 • edited Loading

lingjzhu commented Dec 27, 2022 •

edited

Loading

lingjzhu Jan 7, 2023 •

edited

Loading

lingjzhu commented Jan 31, 2023 •

edited

Loading

rwightman commented Apr 18, 2023 •

edited

Loading