A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum

Neural Information Processing Systems (NeurIPS), 2021

15 February 2021

Abstract

This paper proposes a new algorithm -- the Single-timescale double-momentum Stochastic Approximation (SUSTAIN) -- for tackling unconstrained bilevel optimization problems. We focus on stochastic bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on two-timescale or double loop techniques that track the optimal solution to the lower level subproblem, we design a stochastic momentum assisted gradient estimator for both the upper and lower level updates. The latter allows us to gradually control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. We show that if the upper objective function is smooth but possibly non-convex (resp. strongly-convex), SUSTAIN requires $\mathcal{O}(\epsilon^{-3/2})$ (resp. $\mathcal{O}(\epsilon^{-1})$ ) iterations (each using constant samples) to find an $\epsilon$ -stationary (resp. $\epsilon$ -optimal) solution. The $\epsilon$ -stationary (resp. $\epsilon$ -optimal) solution is defined as the point where norm squared of the gradient of the outer function (resp. difference of outer function from optimal objective value) is less than or equal to $\epsilon$ . The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known sample complexity for single-level stochastic gradient descent algorithms.

View on arXiv

Comments on this paper