v1v2 (latest)

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

24 July 2024

Bertille Follain

Francis R. Bach

ArXiv (abs)PDF HTML Github

Main:51 Pages

5 Figures

Bibliography:5 Pages

Abstract

We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is $\mathbb{E}_w ( k^{(B)}(w^\top x,w^\top x^\prime))$ , with $k^{(B)}(a,b) := \min(|a|, |b|)\mathds{1}_{ab>0}$ the Brownian kernel, and the distribution of the projections $w$ is learnt. This can also be viewed as an infinite-width one-hidden layer neural network, optimising the first layer's weights through gradient descent and explicitly adjusting the non-linearity and weights of the second layer. We introduce a gradient-based computational method for the estimator, called Brownian Kernel Neural Network (BKerNN), using particles to approximate the expectation, where the positive homogeneity of the Brownian kernel \red{leads to improved robustness to local minima}. Using Rademacher complexity, we show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of $O( \min((d/n)^{1/2}, n^{-1/6}))$ (up to logarithmic factors). Numerical experiments confirm our optimisation intuitions, and BKerNN outperforms kernel ridge regression, and favourably compares to a one-hidden layer neural network with ReLU activations in various settings and real data sets.

View on arXiv

Comments on this paper