124

Analysis of Schedule-Free Nonconvex Optimization

Main:10 Pages
4 Figures
Bibliography:2 Pages
Appendix:24 Pages
Abstract

First-order methods underpin most large-scale learning algorithms, yet their classical convergence guarantees hinge on carefully scheduled step-sizes that depend on the total horizon TT, which is rarely known in advance. The Schedule-Free (SF) method promises optimal performance with hyperparameters that are independent of TT by interpolating between Polyak--Ruppert averaging and momentum, but nonconvex analysis of SF has been limited or reliant on strong global assumptions. We introduce a robust Lyapunov framework that, under only LL-smoothness and lower-boundedness, reduces SF analysis to a single-step descent inequality. This yields horizon-agnostic bounds in the nonconvex setting: O(1/logT)O(1/\log T) for constant step + PR averaging, O(logT/T)O(\log T/T) for a linearly growing step-size, and a continuum of O(T(1α))O(T^{-(1-\alpha)}) rates for polynomial averaging. We complement these proofs with Performance Estimation Problem (PEP) experiments that numerically validate our rates and suggest that our O(1/logT)O(1/\log T) bound on the original nonconvex SF algorithm may tighten to O(1/T)O(1/T). Our work extends SF's horizon-free guarantees to smooth nonconvex optimization and charts future directions for optimal nonconvex rates.

View on arXiv
Comments on this paper