98
v1v2 (latest)

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

Main:51 Pages
5 Figures
Bibliography:5 Pages
Abstract

We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is Ew(k(B)(wx,wx))\mathbb{E}_w ( k^{(B)}(w^\top x,w^\top x^\prime)), with k(B)(a,b):=min(a,b)\mathds1ab>0k^{(B)}(a,b) := \min(|a|, |b|)\mathds{1}_{ab>0} the Brownian kernel, and the distribution of the projections ww is learnt. This can also be viewed as an infinite-width one-hidden layer neural network, optimising the first layer's weights through gradient descent and explicitly adjusting the non-linearity and weights of the second layer. We introduce a gradient-based computational method for the estimator, called Brownian Kernel Neural Network (BKerNN), using particles to approximate the expectation, where the positive homogeneity of the Brownian kernel \red{leads to improved robustness to local minima}. Using Rademacher complexity, we show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of O(min((d/n)1/2,n1/6))O( \min((d/n)^{1/2}, n^{-1/6})) (up to logarithmic factors). Numerical experiments confirm our optimisation intuitions, and BKerNN outperforms kernel ridge regression, and favourably compares to a one-hidden layer neural network with ReLU activations in various settings and real data sets.

View on arXiv
Comments on this paper