The Landscape of Empirical Risk for Non-convex Losses
Abstract
We revisit the problem of learning a noisy linear classifier by minimizing the empirical risk associated to the square loss. While the empirical risk is non-convex, we prove that its structure is remarkably simple. Namely, when the sample size is larger than (with the dimension, and a constant) the following happen with high probability: The empirical risk has a unique local minimum (which is also the global minimum); Gradient descent converges exponentially fast to the global minimizer, from any initialization; The global minimizer approaches the true parameter at nearly optimal rate. The core of our argument is to establish a uniform convergence result for the gradients and Hessians of the empirical risk.
View on arXivComments on this paper
