# Supervised Contrastive Learning

@article{Khosla2020SupervisedCL, title={Supervised Contrastive Learning}, author={Prannay Khosla and Piotr Teterwak and Chen Wang and Aaron Sarna and Yonglong Tian and Phillip Isola and Aaron Maschinot and Ce Liu and Dilip Krishnan}, journal={ArXiv}, year={2020}, volume={abs/2004.11362} }

Cross entropy is the most widely used loss function for supervised training of image classification models. In this paper, we propose a novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations. We modify the batch contrastive loss, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting. We are thus able to leverage label information… Expand

#### Figures, Tables, and Topics from this paper

#### Paper Mentions

#### 410 Citations

Class Interference Regularization

- Computer Science
- BMVC
- 2020

Class Interference Regularization (CIR) is the first regularization technique to act on the output features of a contrastive loss, and performs on par with the popular label smoothing, as demonstrated for CIFAR-10 and -100. Expand

Does Data Augmentation Benefit from Split BatchNorms

- Computer Science
- ArXiv
- 2020

A recently proposed training paradigm is explored using an auxiliary BatchNorm for the potentially out-of-distribution, strongly augmented images, and this method significantly improves the performance of common image classification benchmarks such as CIFar-10, CIFAR-100, and ImageNet. Expand

Contrastive Learning with Adversarial Examples

- Computer Science
- NeurIPS
- 2020

A new family of adversarial examples for constrastive learning is introduced and used to define a new adversarial training algorithm for SSL, denoted as CLAE, which improves the performance of several existing CL baselines on multiple datasets. Expand

G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

- Computer Science, Mathematics
- 2020 International Conference on Data Mining Workshops (ICDMW)
- 2020

This work proposes that, with the normalized temperature-scaled cross-entropy loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch, and uses the latent space representation of a denoising autoencoder trained on the unlabeled dataset to obtain pseudo labels. Expand

i-Mix: A Strategy for Regularizing Contrastive Representation Learning

- Computer Science, Mathematics
- ArXiv
- 2020

It is demonstrated that i-Mix consistently improves the quality of self-supervised representations across domains, resulting in significant performance gains on downstream tasks, and its regularization effect is confirmed via extensive ablation studies across model and dataset sizes. Expand

Contrastive Generative Adversarial Networks

- Computer Science
- ArXiv
- 2020

A novel conditional contrastive loss to maximize a lower bound on mutual information between samples from the same class to improve conditional image synthesis and robust to network architecture selection. Expand

Adversarial Self-Supervised Contrastive Learning

- Computer Science, Mathematics
- NeurIPS
- 2020

This paper proposes a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples, and presents a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data. Expand

Self-supervised Co-training for Video Representation Learning

- Computer Science
- NeurIPS
- 2020

This paper investigates the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, and proposes a novel self-supervised co-training scheme to improve the popular infoNCE loss. Expand

Hybrid Discriminative-Generative Training via Contrastive Learning

- Computer Science, Mathematics
- ArXiv
- 2020

This paper shows that through the perspective of hybrid discriminative-generative training of energy-based models, a direct connection can be made between contrastive learning and supervised learning and shows a specific choice of approximation of the energy- based loss outperforms the existing practice in terms of classification accuracy. Expand

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

- Computer Science
- ICLR
- 2021

A novel data augmentation framework dubbed CoDA is proposed, which synthesizes diverse and informative augmented examples by integrating multiple transformations organically by introducing a contrastive regularization objective to capture the global relationship among all the data samples. Expand

#### References

SHOWING 1-10 OF 76 REFERENCES

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

- Computer Science, Mathematics
- NeurIPS
- 2019

A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound is proposed that replaces the standard cross-entropy objective during training and can be applied with prior strategies for training with class-imbalance such as re-weighting or re-sampling. Expand

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

- Computer Science, Mathematics
- NeurIPS
- 2018

A theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE are presented and can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. Expand

Large-Margin Softmax Loss for Convolutional Neural Networks

- Computer Science, Mathematics
- ICML
- 2016

A generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features and which not only can adjust the desired margin but also can avoid overfitting is proposed. Expand

RandAugment: Practical data augmentation with no separate search

- Computer Science
- ArXiv
- 2019

RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous learned augmentation approaches on CIFAR-10, CIFar-100, SVHN, and ImageNet. Expand

Large Margin Deep Networks for Classification

- Computer Science, Mathematics
- NeurIPS
- 2018

This work proposes a novel loss function to impose a margin on any chosen set of layers of a deep network (including input and hidden layers), and demonstrates that the decision boundary obtained by the loss has nice properties compared to standard classification loss functions. Expand

Cross-Entropy Loss and Low-Rank Features Have Responsibility for Adversarial Examples

- Computer Science, Mathematics
- ArXiv
- 2019

State-of-the-art neural networks are vulnerable to adversarial examples; they can easily misclassify inputs that are imperceptibly different than their training and test data. In this work, we… Expand

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

- Computer Science
- 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019

Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task. Expand

Unsupervised Feature Learning via Non-parametric Instance Discrimination

- Computer Science
- 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018

This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Expand

Improved Deep Metric Learning with Multi-class N-pair Loss Objective

- Mathematics, Computer Science
- NIPS
- 2016

This paper proposes a new metric learning objective called multi-class N-pair loss, which generalizes triplet loss by allowing joint comparison among more than one negative examples and reduces the computational burden of evaluating deep embedding vectors via an efficient batch construction strategy using only N pairs of examples. Expand

A Simple Framework for Contrastive Learning of Visual Representations

- Computer Science, Mathematics
- ICML
- 2020

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. Expand