648

Whitening for Self-Supervised Representation Learning

International Conference on Machine Learning (ICML), 2020
Abstract

Most of the self-supervised learning methods are based on the contrastive loss, where image instances which share the same semantic content ("positives") are contrasted with instances extracted from other images ("negatives"). For the learning to be effective, a lot of negatives should be compared with a positive pair, which is computationally demanding. In this paper, we propose a different direction and a new loss function for self-supervised learning which is based on the whitening of the latent-space features. The whitening operation has a "scattering" effect on the batch samples, which compensates the use of negatives, avoiding degenerate solutions where all the sample representations collapse to a single point. Our Whitening MSE (W-MSE) loss does not require special heuristics (e.g. additional networks) and it is conceptually simple. Since negatives are not needed, we suggest obtaining multiple positive pairs from one image in the batch. We show empirically that W-MSE is competitive with respect to popular, more complex self-supervised methods. The source code of the method and all the experiments is available at https://github.com/htdt/self-supervised.

View on arXiv
Comments on this paper