53
16

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Abstract

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data (yi,xi)(y_i,{\boldsymbol x}_i), ini\le n with xiN(0,Σ){\boldsymbol x}_i\sim {\sf N}(0,{\boldsymbol \Sigma}) a pp-dimensional feature vector, and yi{+1,1}y_i \in\{+1,-1\} a label whose distribution depends on a linear combination of the covariates θ,xi\langle{\boldsymbol\theta}_*,{\boldsymbol x}_i\rangle. We consider the proportional asymptotics n,pn,p\to\infty with p/nψp/n\to \psi, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when n,pn,p are of the order of a few hundreds. We explore several choices for (θ,Σ)({\boldsymbol \theta}_*,{\boldsymbol \Sigma}), and show that the resulting generalization curve (test error error as a function of the overparametrization ψ=p/n\psi=p/n) is qualitatively different, depending on this choice. In particular we consider a specific structure of (θ,Σ)({\boldsymbol \theta}_*,{\boldsymbol\Sigma}) that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data (yi,xi)(y_i,{\boldsymbol x}_i) with xiRd{\boldsymbol x}_i\in{\mathbb R}^d but we do so by first embedding them a pp dimensional feature space via xiσ(Wxi){\boldsymbol x}_i\mapsto\sigma({\boldsymbol W}{\boldsymbol x}_i) and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics p,n,dp,n,d\to\infty with p/dψ1p/d\to\psi_1, n/dψ2n/d\to\psi_2 and observe that the test error is minimized in the highly overparametrized regime ψ10\psi_1\gg 0.

View on arXiv
Comments on this paper