BiSSL: A Bilevel Optimization Framework for Enhancing the Alignment Between Self-Supervised Pre-Training and Downstream Fine-Tuning

This study presents BiSSL, a novel training framework that utilizes bilevel optimization to enhance the alignment between the pretext pre-training and downstream fine-tuning stages in self-supervised learning. BiSSL formulates the pretext and downstream task objectives as the lower- and upper-level objectives in a bilevel optimization problem and serves as an intermediate training stage within the self-supervised learning pipeline. By explicitly modeling the interdependence of these training stages, BiSSL facilitates enhanced information sharing between them, ultimately leading to a backbone parameter initialization that is better aligned for the downstream task. We propose a versatile training algorithm that alternates between optimizing the two objectives defined in BiSSL, which is applicable to a broad range of pretext and downstream tasks. Using SimCLR and Bootstrap Your Own Latent to pre-train ResNet-50 backbones on the ImageNet dataset, we demonstrate that our proposed framework significantly outperforms the conventional self-supervised learning pipeline on the vast majority of 12 downstream image classification datasets, as well as on object detection. Visualizations of the backbone features provide further evidence that BiSSL improves the downstream task alignment of the backbone features prior to fine-tuning.
View on arXiv