ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.04105
15
0

Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

5 April 2025
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
ArXivPDFHTML
Abstract

We study gradient descent\textit{gradient descent}gradient descent (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter η\etaη. We show that after at most 1/γ21/\gamma^21/γ2 burn-in steps, GD achieves a risk upper bounded by exp⁡(−Θ(η))\exp(-\Theta(\eta))exp(−Θ(η)), where γ\gammaγ is the margin of the dataset. As η\etaη can be arbitrarily large, GD attains an arbitrarily small risk immediately after the burn-in steps\textit{immediately after the burn-in steps}immediately after the burn-in steps, though the risk evolution may be non-monotonic\textit{non-monotonic}non-monotonic.We further construct hard datasets with margin γ\gammaγ, where any batch (or online) first-order method requires Ω(1/γ2)\Omega(1/\gamma^2)Ω(1/γ2) steps to find a linear separator. Thus, GD with large, adaptive stepsizes is minimax optimal\textit{minimax optimal}minimax optimal among first-order batch methods. Notably, the classical Perceptron\textit{Perceptron}Perceptron (Novikoff, 1962), a first-order online method, also achieves a step complexity of 1/γ21/\gamma^21/γ2, matching GD even in constants.Finally, our GD analysis extends to a broad class of loss functions and certain two-layer networks.

View on arXiv
@article{zhang2025_2504.04105,
  title={ Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes },
  author={ Ruiqi Zhang and Jingfeng Wu and Licong Lin and Peter L. Bartlett },
  journal={arXiv preprint arXiv:2504.04105},
  year={ 2025 }
}
Comments on this paper