The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

5 November 2019

Abstract

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data $(y_i,{\boldsymbol x}_i)$ , $i\le n$ with ${\boldsymbol x}_i\sim {\sf N}(0,{\boldsymbol \Sigma})$ a $p$ -dimensional feature vector, and $y_i \in\{+1,-1\}$ a label whose distribution depends on a linear combination of the covariates $\langle{\boldsymbol\theta}_*,{\boldsymbol x}_i\rangle$ . We consider the proportional asymptotics $n,p\to\infty$ with $p/n\to \psi$ , and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when $n,p$ are of the order of a few hundreds. We explore several choices for $({\boldsymbol \theta}_*,{\boldsymbol \Sigma})$ , and show that the resulting generalization curve (test error error as a function of the overparametrization $\psi=p/n$ ) is qualitatively different, depending on this choice. In particular we consider a specific structure of $({\boldsymbol \theta}_*,{\boldsymbol\Sigma})$ that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data $(y_i,{\boldsymbol x}_i)$ with ${\boldsymbol x}_i\in{\mathbb R}^d$ but we do so by first embedding them a $p$ dimensional feature space via ${\boldsymbol x}_i\mapsto\sigma({\boldsymbol W}{\boldsymbol x}_i)$ and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics $p,n,d\to\infty$ with $p/d\to\psi_1$ , $n/d\to\psi_2$ and observe that the test error is minimized in the highly overparametrized regime $\psi_1\gg 0$ .

View on arXiv

Comments on this paper