The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data , with a -dimensional feature vector, and a label whose distribution depends on a linear combination of the covariates . We consider the proportional asymptotics with , and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when are of the order of a few hundreds. We explore several choices for , and show that the resulting generalization curve (test error error as a function of the overparametrization ) is qualitatively different, depending on this choice. In particular we consider a specific structure of that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data with but we do so by first embedding them a dimensional feature space via and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics with , and observe that the test error is minimized in the highly overparametrized regime .
View on arXiv