27
38

Temperature Schedules for Self-Supervised Contrastive Methods on Long-Tail Data

Anna Kukleva
Moritz Bohle
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
Abstract

Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter τ\tau in the contrastive loss, by analysing the loss through the lens of average distance maximisation, and find that a large τ\tau emphasises group-wise discrimination, whereas a small τ\tau leads to a higher degree of instance discrimination. While τ\tau has thus far been treated exclusively as a constant hyperparameter, in this work, we propose to employ a dynamic τ\tau and show that a simple cosine schedule can yield significant improvements in the learnt representations. Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details. Since frequent classes benefit from the former, while infrequent classes require the latter, we find this method to consistently improve separation between the classes in long-tail data without any additional computational cost.

View on arXiv
Comments on this paper