317
v1v2 (latest)

AdaDim: Dimensionality Adaptation for SSL Representational Dynamics

Main:10 Pages
20 Figures
Bibliography:3 Pages
10 Tables
Appendix:14 Pages
Abstract

A key factor in effective Self-Supervised learning (SSL) is preventing dimensional collapse, where higher-dimensional representation spaces (RR) span a lower-dimensional subspace. Therefore, SSL optimization strategies involve guiding a model to produce RR with a higher dimensionality (H(R)H(R)) through objectives that encourage decorrelation of features or sample uniformity in RR. A higher H(R)H(R) indicates that RR has greater feature diversity which is useful for generalization to downstream tasks. Alongside dimensionality optimization, SSL algorithms also utilize a projection head that maps RR into an embedding space ZZ. Recent work has characterized the projection head as a filter of noisy or irrelevant features from the SSL objective by reducing the mutual information I(R;Z)I(R;Z). Therefore, the current literature's view is that a good SSL representation space should have a high H(R)H(R) and a low I(R;Z)I(R;Z). However, this view of SSL is lacking in terms of an understanding of the underlying training dynamics that influences the relationship between both terms. Our analysis shows that the best performing SSL models do not have the highest H(R)H(R) nor the lowest I(R;Z)I(R;Z), but effectively arrive at a balance between both. To take advantage of this analysis, we introduce AdaDim, a training strategy that leverages SSL training dynamics by adaptively balancing between increasing H(R)H(R) through feature decorrelation and sample uniformity as well as gradual regularization of I(R;Z)I(R;Z) as training progresses. We show performance improvements of up to 3% over common SSL baselines despite our method not utilizing expensive techniques such as queues, clustering, predictor networks, or student-teacher architectures.

View on arXiv
Comments on this paper