Convex SGD: Generalization Without Early Stopping

Abstract
We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations and the dataset size go to zero at arbitrary rates; our bound scales as with step-size . In particular, strong convexity is not needed for stochastic gradient descent to generalize well.
View on arXivComments on this paper