v1v2v3v4 (latest)

A Theory of Feature Learning in Kernel Models

18 October 2023

ArXiv (abs)PDF HTML Github (1★)

Main:8 Pages

12 Figures

Appendix:38 Pages

Abstract

We study feature learning in a compositional variant of kernel ridge regression in which the predictor is applied to a learnable linear transformation of the input. When the response depends on the input only through a low-dimensional predictive subspace, we show that all global minimizers of the population objective for the linear transformation annihilate directions orthogonal to this subspace, and in certain regimes, exactly identify the subspace. Moreover, we show that global minimizers of the finite-sample objective inherit the exact same low-dimensional structure with high probability, even without any explicit penalization on the linear transformation.

View on arXiv

Comments on this paper