29
5

A Variance-Reduced Stochastic Accelerated Primal Dual Algorithm

Abstract

In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems minxRdxmaxyRdyf(x,y)\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y) where ff is LL-smooth, f(.,y)f(.,y) is μ\mu-strongly convex for every yy, and f(x,.)f(x,.) is μ\mu-strongly concave for every xx. Such problems arise frequently in machine learning in the context of robust empirical risk minimization (ERM), e.g. distributionally robust\textit{distributionally robust} ERM, where partial gradients are estimated using mini-batches of data points. Assuming we have access to an unbiased stochastic first-order oracle we consider the stochastic accelerated primal dual (SAPD) algorithm recently introduced in Zhang et al. [2021] for SCSC SP problems as a robust method against gradient noise. In particular, SAPD recovers the well-known stochastic gradient descent ascent (SGDA) as a special case when the momentum parameter is set to zero and can achieve an accelerated rate when the momentum parameter is properly tuned, i.e., improving the κL/μ\kappa \triangleq L/\mu dependence from κ2\kappa^2 for SGDA to κ\kappa. We propose efficient variance-reduction strategies for SAPD based on Richardson-Romberg extrapolation and show that our method improves upon SAPD both in practice and in theory.

View on arXiv
Comments on this paper