CLIP

CLIP (Contrastive Language-Image Pre-Training) [1] is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

Use OpenAI CLIP model

Interact with CLIP

References

[1] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CLIP

Use OpenAI CLIP model

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

CLIP

Use OpenAI CLIP model

References