353

The Landscape of Empirical Risk for Non-convex Losses

Abstract

We revisit the problem of learning a noisy linear classifier by minimizing the empirical risk associated to the square loss. While the empirical risk is non-convex, we prove that its structure is remarkably simple. Namely, when the sample size is larger than CdlogdC \, d\log d (with dd the dimension, and CC a constant) the following happen with high probability: (a)(a) The empirical risk has a unique local minimum (which is also the global minimum); (b)(b) Gradient descent converges exponentially fast to the global minimizer, from any initialization; (c)(c) The global minimizer approaches the true parameter at nearly optimal rate. The core of our argument is to establish a uniform convergence result for the gradients and Hessians of the empirical risk.

View on arXiv
Comments on this paper