ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.14528
8
5

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

26 July 2023
Guillaume Garrigos
Robert Mansel Gower
Fabian Schaipp
ArXivPDFHTML
Abstract

Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called SPS+\texttt{SPS}_+SPS+​ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This SPS+\texttt{SPS}_+SPS+​ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that SPS+\texttt{SPS}_+SPS+​ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop FUVAL\texttt{FUVAL}FUVAL, a variant of SPS+\texttt{SPS}_+SPS+​ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of FUVAL\texttt{FUVAL}FUVAL, as a projection based method, as a variant of the prox-linear method, and then as a particular online SGD method. We then present a convergence analysis of FUVAL\texttt{FUVAL}FUVAL and experimental results. The shortcomings of our work is that the convergence analysis of FUVAL\texttt{FUVAL}FUVAL shows no advantage over SGD. Another shortcomming is that currently only the full batch version of FUVAL\texttt{FUVAL}FUVAL shows a minor advantages of GD (Gradient Descent) in terms of sensitivity to the step size. The stochastic version shows no clear advantage over SGD. We conjecture that large mini-batches are required to make FUVAL\texttt{FUVAL}FUVAL competitive. Currently the new FUVAL\texttt{FUVAL}FUVAL method studied in this paper does not offer any clear theoretical or practical advantage. We have chosen to make this draft available online nonetheless because of some of the analysis techniques we use, such as the non-smooth analysis of SPS+\texttt{SPS}_+SPS+​, and also to show an apparently interesting approach that currently does not work.

View on arXiv
Comments on this paper