452
v1v2v3 (latest)

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Main:48 Pages
3 Figures
Bibliography:8 Pages
3 Tables
Abstract

In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an ϵ\epsilon-stationary point within O(ϵ2){\mathcal{O}}(\epsilon^{-2}) oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of O~(ϵ3)\tilde{\mathcal{O}}(\epsilon^{-3}). In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal O~(ϵ2)\tilde {\mathcal{O}}(\epsilon^{-2}) first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of O~(ϵ4)\tilde {\mathcal{O}}(\epsilon^{-4}) and O~(ϵ6)\tilde {\mathcal{O}}(\epsilon^{-6}) when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-order smoothness conditions, our deterministic method can escape saddle points by injecting noise, and can be accelerated to achieve a faster rate of O~(ϵ1.75)\tilde {\mathcal{O}}(\epsilon^{-1.75}) using Nesterov's momentum.

View on arXiv
Comments on this paper