Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

6 June 2024

Abstract

Stochastic gradient descent with momentum, also known as Stochastic Heavy Ball method (SHB), is one of the most popular algorithms for solving large-scale stochastic optimization problems in various machine learning tasks. In practical scenarios, tuning the step-size and momentum parameters of the method is a prohibitively expensive and time-consuming process. In this work, inspired by the recent advantages of stochastic Polyak step-size in the performance of stochastic gradient descent (SGD), we propose and explore new Polyak-type variants suitable for the update rule of the SHB method. In particular, using the Iterate Moving Average (IMA) viewpoint of SHB, we propose and analyze three novel step-size selections: MomSPS $_{\max}$ , MomDecSPS, and MomAdaSPS. For MomSPS $_{\max}$ , we provide convergence guarantees for SHB to a neighborhood of the solution for convex and smooth problems (without assuming interpolation). If interpolation is also satisfied, then using MomSPS $_{\max}$ , SHB converges to the true solution at a fast rate matching the deterministic HB. The other two variants, MomDecSPS and MomAdaSPS, are the first adaptive step-size for SHB that guarantee convergence to the exact minimizer - without a priori knowledge of the problem parameters and without assuming interpolation. Our convergence analysis of SHB is tight and obtains the convergence guarantees of stochastic Polyak step-size for SGD as a special case. We supplement our analysis with experiments validating our theory and demonstrating the effectiveness and robustness of our algorithms.

View on arXiv

@article{oikonomou2025_2406.04142,
  title={ Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance },
  author={ Dimitris Oikonomou and Nicolas Loizou },
  journal={arXiv preprint arXiv:2406.04142},
  year={ 2025 }
}

Comments on this paper