59
16

Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

Abstract

We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has the overall complexity O(ndlogmin{1/ϵ,n}+d/ϵ)O(nd\log\min \{1/\epsilon, n\} + d/\epsilon ) in terms of the primal-dual gap, where nn denotes the number of samples, dd the dimension of the primal variables, and ϵ\epsilon the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of \vrpda~becomes O(ndlogmin{1/ϵ,n}+d/ϵ)O(nd\log\min\{1/\epsilon, n\} + d/\sqrt{\epsilon}) in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for \vrpda~improve significantly on state-of-the-art complexity estimates, which are O(ndlogmin{1/ϵ,n}+nd/ϵ)O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\epsilon) for the nonsmooth and general convex setting and O(ndlogmin{1/ϵ,n}+nd/ϵ)O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\sqrt{\epsilon}) for the nonsmooth and strongly convex setting, in a much more simple and straightforward way. Moreover, both complexities are better than \emph{lower} bounds for general convex finite sums that lack the particular (common) structure that we consider. Our theoretical results are supported by numerical experiments, which confirm the competitive performance of \vrpda~compared to state-of-the-art.

View on arXiv
Comments on this paper