Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

3 September 2025

Lesi Chen

Junru Li

Jingzhao Zhang

ArXiv (abs)PDF HTML

Main:9 Pages

1 Figures

Bibliography:3 Pages

1 Tables

Appendix:8 Pages

Abstract

This paper studies the complexity of finding an $\epsilon$ -stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F ${}^2$ SA, achieving the $\tilde{\mathcal{O}}(\epsilon^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $\Omega(\epsilon^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F $^2$ SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F ${}^2$ SA- $p$ that uses $p$ th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p \epsilon^{4-p/2})$ for $p$ th-order smooth problems. Finally, we demonstrate that the $\Omega(\epsilon^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F ${}^2$ SA- $p$ is nearly optimal in the highly smooth region $p = \Omega( \log \epsilon^{-1} / \log \log \epsilon^{-1})$ .

View on arXiv

Comments on this paper