Skip to content

Latest commit

 

History

History
47 lines (24 loc) · 4.63 KB

File metadata and controls

47 lines (24 loc) · 4.63 KB

Contrastive Learning

Pretext Learning to Contrastive Learning

J. B. Grill et. al. [1] stated that some self-supervised learning methods, such as Jigsaw Puzzles [2], C. Doersch et. al. [3], Denoising Autoencoder [4], Image Colorization [5], Context Autoencoder [6], Split-Brain Autoencoder [7], and M. Noroozi et. al. [8], are not contrastive but rely on using auxiliary handcrafted prediction tasks to learn their representation. These models are called pretext models. However, the accuracy of these pretext models are not as good as models that are trained fully supervised learning. To overcome the weak points of these pretext models, researchers introduced a new concept called "Contrastive Learning".

Principle of Contrastive Learning

Principle of Contrastive Learning

Contrastive learning is an approach to formulate the task of finding similar and dissimilar things for an ML model. Using this approach, one can train a machine learning model to classify between similar and dissimilar images.

The inner working of contrastive learning can be formulated as a score function, which is a metric that measures the similarity between two features.

Contrast Learning score function

In the equation above, "x+" is data point similar to x, referred to as a positive sample, and "x−" is a data point dissimilar to x, referred to as a negative sample. Over this, a softmax classifier can be built that classifies positive and negative samples correctly.

SimCLR

Google has introduced a framework called SimCLR [9] that uses contrastive learning. This framework first learns generic representations of images on an unlabeled dataset and then is fine-tuned with a small dataset of labelled images for a given classification task.

SimCLR

The basic representations are learned by simultaneously maximising agreement between different versions or views of the same image and cutting down the difference using contrastive learning.

When the parameters of a neural network are updated using this contrastive objective causes representations of corresponding views to “attract” each other, while representations of non-corresponding views “repel” each other.

References

[1] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

[2] Mehdi Noroozi, Paolo Favaro. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

[3] Carl Doersch, Abhinav Gupta, Alexei A. Efros. Unsupervised Visual Representation Learning by Context Prediction

[4] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

[5] Richard Zhang, Phillip Isola, Alexei A. Efros. Colorful Image Colorization

[6] Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros. Context Encoders: Feature Learning by Inpainting

[7] Richard Zhang, Phillip Isola, Alexei A. Efros. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

[8] Mehdi Noroozi, Hamed Pirsiavash, Paolo Favaro. Representation Learning by Learning to Count

[9] Ting Chen, Geoffrey Hinton. Advancing Self-Supervised and Semi-Supervised Learning with SimCLR