Generalization of ERM in Stochastic Convex Optimization: The Dimension
Strikes Back
In stochastic convex optimization the goal is to minimize a convex function over a convex set where is some unknown distribution and each in the support of is convex over . The optimization is commonly based on i.i.d.~samples from . A standard approach to such problems is empirical risk minimization (ERM) that optimizes . Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of to over . We demonstrate that in the standard setting of Lipschitz-bounded functions over a of bounded radius, ERM requires sample size that scales linearly with the dimension . This nearly matches standard upper bounds and improves on dependence proved for setting by Shalev-Shwartz et al. (2009). In stark contrast, these problems can be solved using dimension-independent number of samples for setting and dependence for setting using other approaches. We also demonstrate that for a more general class of range-bounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2.
View on arXiv