74
2

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Abstract

Bilevel optimization has wide applications such as hyperparameter tuning, neural architecture search, and meta-learning. Designing efficient algorithms for bilevel optimization is challenging because the lower-level problem defines a feasibility set implicitly via another optimization problem. In this work, we consider one tractable case when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product oracle, one can provably find an ϵ\epsilon-first-order stationary point within O~(ϵ2)\tilde{\mathcal{O}}(\epsilon^{-2}) oracle calls. However, Hessian-vector product may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of O~(ϵ3)\tilde{\mathcal{O}}(\epsilon^{-3}). In this work, we provide a tighter analysis demonstrating that this method can converge at the near-optimal O~(ϵ2)\tilde {\mathcal{O}}(\epsilon^{-2}) rate as second-order methods. Our analysis further leads to simple first-order algorithms that achieve similar convergence rates for finding second-order stationary points and for distributed bilevel problems.

View on arXiv
Comments on this paper