Supervised Models Can Generalize Also When Trained on Random Label

16 May 2025

Abstract

The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$ . In this paper, we demonstrate that this is indeed possible. The key step is to formulate the model as a smoother, i.e. on the form $\hat{f}=Sy$ , and to construct the smoother matrix $S$ independently of $y$ , e.g. by training on random labels. We present a simple model selection criterion based on the distribution of the out-of-sample predictions and show that, in contrast to cross-validation, this criterion can be used also without access to $y$ . We demonstrate on real and synthetic data that $y$ -free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$ -based, versions and, most importantly, significantly better than random guessing.

View on arXiv

@article{allerbo2025_2505.11006,
  title={ Supervised Models Can Generalize Also When Trained on Random Label },
  author={ Oskar Allerbo and Thomas B. Schön },
  journal={arXiv preprint arXiv:2505.11006},
  year={ 2025 }
}

Comments on this paper