122

Learning Decorrelated Representations Efficiently Using Fast Fourier Transform

Computer Vision and Pattern Recognition (CVPR), 2023
Abstract

Barlow Twins and VICReg are self-supervised representation learning models that use regularizers to decorrelate features. Although they work as well as conventional representation learning models, their training can be computationally demanding if the dimension of projected representations is high; as these regularizers are defined in terms of individual elements of a cross-correlation or covariance matrix, computing the loss for dd-dimensional projected representations of nn samples takes O(nd2)O(n d^2) time. In this paper, we propose a relaxed version of decorrelating regularizers that can be computed in O(ndlogd)O(n d\log d) time by the fast Fourier transform. We also propose an inexpensive trick to mitigate the undesirable local minima that develop with the relaxation. Models learning representations using the proposed regularizers show comparable accuracy to existing models in downstream tasks, whereas the training requires less memory and is faster when dd is large.

View on arXiv
Comments on this paper