26
2

On the Complexity of First-Order Methods in Stochastic Bilevel Optimization

Abstract

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions y(x)y^*(x) in response to the changes in the upper-level variables xx. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call yy^*-aware, that returns an O(ϵ)O(\epsilon)-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the Θ(ϵ)\Theta(\epsilon)-ball around y(x)y^*(x). We study the complexity of finding stationary points with such an yy^*-aware oracle: we propose a simple first-order method that converges to an ϵ\epsilon stationary point using O(ϵ6),O(ϵ4)O(\epsilon^{-6}), O(\epsilon^{-4}) access to first-order yy^*-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by O(ϵ)O(\epsilon) with minimal assumptions. We then provide the matching Ω(ϵ6)\Omega(\epsilon^{-6}), Ω(ϵ4)\Omega(\epsilon^{-4}) lower bounds without and with an additional smoothness assumption on yy^*-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an yy^*-aware oracle must suffer the same lower bounds.

View on arXiv
Comments on this paper