Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

This paper studies the complexity of finding an -stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, FSA, achieving the upper complexity bound for first-order smooth problems. This is slower than the optimal complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate FSA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods FSA- that uses th-order finite difference for hyper-gradient approximation and improves the upper bound to for th-order smooth problems. Finally, we demonstrate that the lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of FSA- is nearly optimal in the highly smooth region .
View on arXiv