ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.12838
  4. Cited By
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning
  Rate Procedure For Least Squares

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

29 April 2019
Rong Ge
Sham Kakade
Rahul Kidambi
Praneeth Netrapalli
ArXivPDFHTML

Papers citing "The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares"

19 / 69 papers shown
Title
Adaptive Hierarchical Hyper-gradient Descent
Adaptive Hierarchical Hyper-gradient Descent
Renlong Jie
Junbin Gao
A. Vasnev
Minh-Ngoc Tran
21
5
0
17 Aug 2020
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
MLR-SNet: Transferable LR Schedules for Heterogeneous Tasks
Jun Shu
Yanwen Zhu
Qian Zhao
Zongben Xu
Deyu Meng
23
7
0
29 Jul 2020
EfficientHRNet: Efficient Scaling for Lightweight High-Resolution
  Multi-Person Pose Estimation
EfficientHRNet: Efficient Scaling for Lightweight High-Resolution Multi-Person Pose Estimation
Christopher Neff
A. Sheth
Steven Furgurson
Hamed Tabkhi
3DH
19
27
0
16 Jul 2020
Double-Loop Unadjusted Langevin Algorithm
Double-Loop Unadjusted Langevin Algorithm
Paul Rolland
Armin Eftekhari
Ali Kavis
V. Cevher
17
3
0
02 Jul 2020
Guarantees for Tuning the Step Size using a Learning-to-Learn Approach
Guarantees for Tuning the Step Size using a Learning-to-Learn Approach
Xiang Wang
Shuai Yuan
Chenwei Wu
Rong Ge
10
16
0
30 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. Smith
Erich Elsen
Soham De
MLT
19
98
0
26 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Understanding the Role of Training Regimes in Continual Learning
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
18
219
0
12 Jun 2020
Warwick Electron Microscopy Datasets
Warwick Electron Microscopy Datasets
Jeffrey M. Ede
11
14
0
02 Mar 2020
Disentangling Adaptive Gradient Methods from Learning Rates
Disentangling Adaptive Gradient Methods from Learning Rates
Naman Agarwal
Rohan Anil
Elad Hazan
Tomer Koren
Cyril Zhang
11
34
0
26 Feb 2020
A Second look at Exponential and Cosine Step Sizes: Simplicity,
  Adaptivity, and Performance
A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance
Xiaoyun Li
Zhenxun Zhuang
Francesco Orabona
35
18
0
12 Feb 2020
Using Statistics to Automate Stochastic Optimization
Using Statistics to Automate Stochastic Optimization
Hunter Lang
Pengchuan Zhang
Lin Xiao
11
21
0
21 Sep 2019
Learning an Adaptive Learning Rate Schedule
Learning an Adaptive Learning Rate Schedule
Zhen Xu
Andrew M. Dai
Jonas Kemp
Luke Metz
16
61
0
20 Sep 2019
From low probability to high confidence in stochastic convex
  optimization
From low probability to high confidence in stochastic convex optimization
Damek Davis
Dmitriy Drusvyatskiy
Lin Xiao
Junyu Zhang
11
5
0
31 Jul 2019
Stochastic algorithms with geometric step decay converge linearly on
  sharp functions
Stochastic algorithms with geometric step decay converge linearly on sharp functions
Damek Davis
Dmitriy Drusvyatskiy
Vasileios Charisopoulos
36
26
0
22 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
6
290
0
10 Jul 2019
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
19
193
0
18 Jun 2018
A simpler approach to obtaining an O(1/t) convergence rate for the
  projected stochastic subgradient method
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark W. Schmidt
Francis R. Bach
126
259
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence
  Results and Optimal Averaging Schemes
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
101
570
0
08 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
177
683
0
07 Dec 2010
Previous
12