Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.12838
Cited By
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
29 April 2019
Rong Ge
Sham Kakade
Rahul Kidambi
Praneeth Netrapalli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares"
50 / 69 papers shown
Title
Better Rates for Random Task Orderings in Continual Linear Models
Itay Evron
Ran Levinstein
Matan Schliserman
Uri Sherman
Tomer Koren
Daniel Soudry
Nathan Srebro
CLL
35
0
0
06 Apr 2025
Benefits of Learning Rate Annealing for Tuning-Robustness in Stochastic Optimization
Amit Attia
Tomer Koren
67
1
0
13 Mar 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
58
5
0
21 Feb 2025
Can Large Language Models Invent Algorithms to Improve Themselves?
Yoichi Ishibashi
Taro Yano
Masafumi Oyamada
AIFin
LRM
34
1
0
21 Oct 2024
The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
Haihan Zhang
Yuanshi Liu
Qianwen Chen
Cong Fang
38
0
0
15 Sep 2024
SnapE -- Training Snapshot Ensembles of Link Prediction Models
Ali Shaban
Heiko Paulheim
VLM
30
1
0
05 Aug 2024
Rethinking Feature Backbone Fine-tuning for Remote Sensing Object Detection
Yechan Kim
JongHyun Park
SooYeon Kim
Moongu Jeon
26
0
0
21 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
26
5
0
21 Jun 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Licong Lin
Jingfeng Wu
Sham Kakade
Peter L. Bartlett
Jason D. Lee
LRM
44
15
0
12 Jun 2024
A Generalized Version of Chung's Lemma and its Applications
Li Jiang
Xiao Li
Andre Milzarek
Junwen Qiu
45
1
0
09 Jun 2024
Primitive Agentic First-Order Optimization
R. Sala
19
0
0
07 Jun 2024
Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
37
3
0
07 Jun 2024
New logarithmic step size for stochastic gradient descent
M. S. Shamaee
S. F. Hafshejani
Z. Saeidian
33
3
0
01 Apr 2024
A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques
Xuetong Li
Yuan Gao
Hong Chang
Danyang Huang
Yingying Ma
...
Ke Xu
Jing Zhou
Xuening Zhu
Yingqiu Zhu
Hansheng Wang
44
7
0
17 Mar 2024
On the Convergence of Federated Learning Algorithms without Data Similarity
Ali Beikmohammadi
Sarit Khirirat
Sindri Magnússon
FedML
35
1
0
29 Feb 2024
Provably Scalable Black-Box Variational Inference with Structured Variational Families
Joohwan Ko
Kyurae Kim
W. Kim
Jacob R. Gardner
BDL
33
2
0
19 Jan 2024
DREAM: Debugging and Repairing AutoML Pipelines
Xiaoyu Zhang
Juan Zhai
Shiqing Ma
Chao Shen
21
1
0
31 Dec 2023
An investigation of belief-free DRL and MCTS for inspection and maintenance planning
Daniel Koutas
E. Bismut
Daniel Straub
19
2
0
22 Dec 2023
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise
Rui Pan
Yuxing Liu
Xiaoyu Wang
Tong Zhang
23
5
0
22 Dec 2023
On the Role of Server Momentum in Federated Learning
Jianhui Sun
Xidong Wu
Heng-Chiao Huang
Aidong Zhang
FedML
60
11
0
19 Dec 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao-quan Song
Chiwun Yang
29
9
0
17 Oct 2023
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Peter L. Bartlett
128
49
0
12 Oct 2023
Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023
Yuqi Li
Yi-Jhen Luo
Xiaoshuai Hao
Chuanguang Yang
Zhulin An
Dantong Song
Wei Yi
33
0
0
15 Jun 2023
Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning
Genta Indra Winata
Lingjue Xie
Karthik Radhakrishnan
Shijie Wu
Xisen Jin
Pengxiang Cheng
Mayank Kulkarni
Daniel Preotiuc-Pietro
CLL
18
18
0
25 May 2023
Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load
Maximilian Egger
Serge Kas Hanna
Rawad Bitar
FedML
24
0
0
17 Apr 2023
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach
Adel Javanmard
Vahab Mirrokni
Pratik Worah
24
6
0
27 Mar 2023
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Sham Kakade
90
6
0
03 Mar 2023
Real-Time Damage Detection in Fiber Lifting Ropes Using Convolutional Neural Networks
Tuomas Jalonen
M. A. Sa'd
Roope Mellanen
S. Kiranyaz
Moncef Gabbouj
11
4
0
23 Feb 2023
Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks
Shiyu Liu
Rohan Ghosh
John Tan Chong Min
Mehul Motani
37
0
0
09 Dec 2022
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
Sham Kakade
6
16
0
03 Aug 2022
Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
Courtney Paquette
Elliot Paquette
Ben Adlam
Jeffrey Pennington
17
13
0
15 Jun 2022
Towards an AI-Driven Universal Anti-Jamming Solution with Convolutional Interference Cancellation Network
H. N. Nguyen
G. Noubir
14
1
0
18 Mar 2022
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
Difan Zou
Jingfeng Wu
Vladimir Braverman
Quanquan Gu
Sham Kakade
11
5
0
07 Mar 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
On Uniform Boundedness Properties of SGD and its Momentum Variants
Xiaoyu Wang
M. Johansson
23
3
0
25 Jan 2022
Optimization Planning for 3D ConvNets
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
3DPC
3DH
34
9
0
11 Jan 2022
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Rui Pan
Haishan Ye
Tong Zhang
14
14
0
27 Oct 2021
S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks
Shiyu Liu
Chong Min John Tan
Mehul Motani
CLL
29
4
0
17 Oct 2021
Adaptive Differentially Private Empirical Risk Minimization
Xiaoxia Wu
Lingxiao Wang
Irina Cristali
Quanquan Gu
Rebecca Willett
38
6
0
14 Oct 2021
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
Sham Kakade
104
20
0
12 Oct 2021
Towards Continual Entity Learning in Language Models for Conversational Agents
R. Gadde
I. Bulyko
KELM
14
1
0
30 Jul 2021
Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization
Xiaoyu Wang
M. Johansson
13
2
0
05 Jun 2021
Acceleration via Fractal Learning Rate Schedules
Naman Agarwal
Surbhi Goel
Cyril Zhang
16
18
0
01 Mar 2021
A Biased Graph Neural Network Sampler with Near-Optimal Regret
Qingru Zhang
David Wipf
Quan Gan
Le Song
40
24
0
01 Mar 2021
On the Convergence of Step Decay Step-Size for Stochastic Optimization
Xiaoyu Wang
Sindri Magnússon
M. Johansson
66
23
0
18 Feb 2021
Last iterate convergence of SGD for Least-Squares in the Interpolation regime
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
12
34
0
05 Feb 2021
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
32
2
0
04 Jan 2021
Reverse engineering learned optimizers reveals known and novel mechanisms
Niru Maheswaranathan
David Sussillo
Luke Metz
Ruoxi Sun
Jascha Narain Sohl-Dickstein
22
21
0
04 Nov 2020
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum
Jerry Chee
Ping Li
6
11
0
27 Aug 2020
1
2
Next