Popular iterative algorithms such as boosting methods and coordinate descent on linear models converge to the maximum -margin classifier, a.k.a. sparse hard-margin SVM, in high dimensional regimes where the data is linearly separable. Previous works consistently show that many estimators relying on the -norm achieve improved statistical rates for hard sparse ground truths. We show that surprisingly, this adaptivity does not apply to the maximum -margin classifier for a standard discriminative setting. In particular, for the noiseless setting, we prove tight upper and lower bounds for the prediction error that match existing rates of order for general ground truths. To complete the picture, we show that when interpolating noisy observations, the error vanishes at a rate of order . We are therefore first to show benign overfitting for the maximum -margin classifier.
View on arXiv