A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-L1-Norm Interpolated Classifiers
This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) scales with the sample size , in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical -margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in \cite{breiman1999prediction, schapire1998boosting} surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical margin. Furthermore, we discover that the larger the overparametrization ratio , the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum -margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general geometry, for , are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.
View on arXiv