ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.07002
306
12
v1v2v3 (latest)

On the Last Iterate Convergence of Momentum Methods

International Conference on Algorithmic Learning Theory (ALT), 2021
13 February 2021
Xiaoyun Li
Mingrui Liu
Francesco Orabona
ArXiv (abs)PDFHTML
Abstract

SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of Ω(ln⁡TT)\Omega(\frac{\ln T}{\sqrt{T}})Ω(T​lnT​) after TTT iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with \emph{increasing momentum} and \emph{shrinking updates}. For these algorithms, we show that the last iterate has optimal convergence O(1T)O(\frac{1}{\sqrt{T}})O(T​1​) for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of TTT. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.

View on arXiv
Comments on this paper