Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add coca trained (#307) * initial setup * add coca loss * remove loss from the model * fix loss * add underscores * name changes * add cross attention to Residual and CustomResidual * fix if * ädd transformer 'decoder' * minor fix * looks better * initlize coca model structure * clean * typo and format * checkpoint signature * adjust multimodal decoder and add CoCaTransformer * keep older logic * remove chunk * typo * fix * make chunk dim explicit * adjust cfg names * add attentionalpooling * add attentional pooling to coca * small change * add cocatransformer variants and AttentionPooling * remoive older attention pooler * adapt embed text to coca text transformer * rm coca layers * rename and remove useless CoCa models * make attentionpooler pooler only * refactor for one transformer only * coca forward works * separatae context and n_queries * add inital coca_base config * remove config * small loss change * init training file * make variable order right * remove print * uniform names * renaming * add coca funcs to init * add coca config and exclude from testing * add and comment simple test (no trained model) * add L2 norm * make L2 same as in clip * remove unused temperature * type * clean * fix config * make rename and move cfg * rename * temptative add coca to factory * fix config * update config * embed contrastive cls token in model * remove unused arg * import create_loss * make factory accept coca * make caption loss distributed * make loss customizable * pass loss trhough training_epoch * add coca specific params to params * removed decoder unused parameters * remove unused attributes * adjust coca_config * fix config and remove unused parameters * remove comment * remove more comments * rename attention pooler * rename TransformerDecoder * make AttentionalPooler clearer * add local loss logic to cocaloss * only create loss if train in data * remove wrong file * fix attentional pooler call * not ready for testing * really not ready for testing * eof lien * uniform names * add possible generative loss to evaluate * change _build function names * remove wrong import * remove local_loss from captioning loss * indexing error * finish renaming * adjust configs * add training test for coca * simplify captioning loss * remove hf * fix evaluate and loss * remove print * move projection * add coca vit 32 config * test on new config * adjust coca_base config * remove coca from test_inference * maybe fix regression test * make logits and labels contiguous * simpler logic * make contiguous after transpose * last test * try fix loss * CoCa PR: loss fix + rename file * wait for feedback on this * cleanup * CoCa PR: add set_grad_checkpointing + fix checkpoint API * CoCa PR: fix eval (which uses encode_x instead of forward) * move making space for CLS token into encode_text * rever zs changes + fix Co-authored-by: gpucce <[email protected]> Co-authored-by: gpucce <[email protected]> Co-authored-by: iejmac <[email protected]> * Add coca to CI * Add coca to CI pr * simplify encode_iamge (#313) Co-authored-by: Romain Beaumont <[email protected]> * Add cls mask (#312) * buil_cls_mask * add cls_mask to encode_text * add model properties Co-authored-by: Romain Beaumont <[email protected]> Co-authored-by: gpucce <[email protected]> * Ignore pad tokens in captioning loss (#316) * add ignore_index * just need to pick right index Co-authored-by: gpucce <[email protected]> * add `generate` to coca model (#314) * add initial generative support * make generation context_length independend * remove kwargs * last positional embeddings for CLS * typo * fix mask len * add comment * remove unused args * simpler logic for input shorter than context length Co-authored-by: gpucce <[email protected]> * use `TextEncoder` in coca `encode_image` (#321) * use self.text in encode image * unused var * rever aAtention and CustoResidualAttentionBlock * remove whiteline * add dict output * bintegrate self.text attributes * HF compatibility * better config and minor fixes * clean * remove eembed_cls option from HF * use cls_token_position * fix cls masking * resize labels * text -> self.text * split loss logging * add total loss * minor logs formatting * fix generate * simpler logic * disentangle proj for HF too * adjust config * only norm cls * move attn_pool to VisionTransformer * adjust coca_base config * fix grad checkpointing in MultimodalTransformer Co-authored-by: gpucce <[email protected]> Co-authored-by: iejMac <[email protected]> * Get some basic PEP changes out of the way * Add tests bis (#355) * make jit compilable * redundant annotation * less tests * less annotations * even less annotations * fix name check in ci * some annotations back * make it simpler * make hf simpler too * better jit support with tests * remove extra line * add customtextclip * more jit tests * missing assert * add eval * typo * rever forward changes * clean coca model * more cleaning * last cleaning * train.py: fix is_clip when doing distributed (#364) * add README (#365) * add README * multimodal_cfg info * multimodal * remove output_dict argument (#368) * remove output_dict argument * cleaner * do same thing for _encode_image (#366) * do same thing for _encode_image * encoder * try this * adjust inference tests * fix syntax * True not None * dumb * CoCa/forward: remove unused output_dict param * Revert "do same thing for _encode_image (#366)" This reverts commit de343fb. * refactor * white space * remove extra layer norm * move to_logits into decoder * leave for later * better torchscript * annotate hf too * Add CoCa-ViT-L/14 config (#379) * Remove dead LN code, refactor attn_pool conditional for more clarity, minor formatting tweaks * latent_dim to embed_dim * remove extra cfg * A bit more cleanup, keep context_length as context len, 'num_pos' to incl extra tokens. None type check for embed_cls instead of getattr * CoCa: add B/32 pretrained (#389) * add B/32 pretrained * fix * no capital * slash * remove coca from ci.yml --------- Co-authored-by: gpucce <[email protected]> Co-authored-by: gpucce <[email protected]> Co-authored-by: iejmac <[email protected]> Co-authored-by: iejMac <[email protected]> Co-authored-by: Ross Wightman <[email protected]>
- Loading branch information