212

New nonasymptotic convergence rates of stochastic proximal pointalgorithm for convex optimization problems with many constraints

Abstract

\noindent Significant parts of the recent stochastic optimization literature focused on analyzing the theoretical and practical behaviour of stochastic first order schemes under various convexity properties. Due to its simplicity, the traditional method of choice for most supervised machine learning problems is the stochastic gradient descent (SGD) method, which is known to have a relatively slow convergence. Many iteration improvements and accelerations have been added to the pure SGD in order to boost its convergence under different (strong) convexity conditions when constraints are present. However, full projections on complicated feasible set, smoothness or strong convexity assumptions are an essential requirement for these improved stochastic first-order schemes. In this paper novel convergence results are presented for the stochastic proximal point (SPP) algorithm for (non-)strongly convex optimization with many constraints. We show that a prox-quadratic growth assumption is sufficient to guarantee for SPP O(1k)\mathcal{O}\left(\frac{1}{k}\right) convergence rate, in terms of the distance to the optimal set, using only projections onto a simple component set. Furthermore, linear convergence is obtained for interpolation setting, when the optimal set of the expected cost is included into the optimal sets of each functional component.

View on arXiv
Comments on this paper