120

Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

Main:9 Pages
1 Figures
Bibliography:3 Pages
1 Tables
Appendix:8 Pages
Abstract

This paper studies the complexity of finding an ϵ\epsilon-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F2{}^2SA, achieving the O~(ϵ6)\tilde{\mathcal{O}}(\epsilon^{-6}) upper complexity bound for first-order smooth problems. This is slower than the optimal Ω(ϵ4)\Omega(\epsilon^{-4}) complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F2^2SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F2{}^2SA-pp that uses ppth-order finite difference for hyper-gradient approximation and improves the upper bound to O~(pϵ4p/2)\tilde{\mathcal{O}}(p \epsilon^{4-p/2}) for ppth-order smooth problems. Finally, we demonstrate that the Ω(ϵ4)\Omega(\epsilon^{-4}) lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F2{}^2SA-pp is nearly optimal in the highly smooth region p=Ω(logϵ1/loglogϵ1)p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1}).

View on arXiv
Comments on this paper