ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01050
  4. Cited By
Understanding the unstable convergence of gradient descent

Understanding the unstable convergence of gradient descent

3 April 2022
Kwangjun Ahn
J. Zhang
S. Sra
ArXivPDFHTML

Papers citing "Understanding the unstable convergence of gradient descent"

44 / 44 papers shown
Title
Physics Informed Constrained Learning of Dynamics from Static Data
Physics Informed Constrained Learning of Dynamics from Static Data
Pengtao Dang
Tingbo Guo
Melissa Fishel
Guang Lin
Wenzhuo Wu
Sha Cao
Chi Zhang
PINN
AI4CE
49
0
0
17 Apr 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
20
0
0
05 Apr 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
69
0
0
04 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
47
4
0
17 Feb 2025
Building a Multivariate Time Series Benchmarking Datasets Inspired by
  Natural Language Processing (NLP)
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
22
0
0
14 Oct 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks:
  Margin Improvement and Fast Optimization
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
32
2
0
12 Jun 2024
Gradient Descent on Logistic Regression with Non-Separable Data and
  Large Step Sizes
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Si Yi Meng
Antonio Orvieto
Daniel Yiming Cao
Christopher De Sa
30
1
0
07 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian
  Alignment
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
20
0
0
31 May 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
56
4
1
25 May 2024
GD doesn't make the cut: Three ways that non-differentiability affects
  neural network training
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
13
2
0
16 Jan 2024
AdamL: A fast adaptive gradient method incorporating loss function
AdamL: A fast adaptive gradient method incorporating loss function
Lu Xia
Stefano Massei
ODL
38
3
0
23 Dec 2023
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large
  Catapults
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
16
0
0
25 Nov 2023
Good regularity creates large learning rate implicit biases: edge of
  stability, balancing, and catapult
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
24
10
0
26 Oct 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in
  Quadratic Regression
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
28
7
0
02 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
17
3
0
01 Oct 2023
Sharpness-Aware Minimization and the Edge of Stability
Sharpness-Aware Minimization and the Edge of Stability
Philip M. Long
Peter L. Bartlett
AAML
25
9
0
21 Sep 2023
Good-looking but Lacking Faithfulness: Understanding Local Explanation
  Methods through Trend-based Testing
Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing
Jinwen He
Kai Chen
Guozhu Meng
Jiangshan Zhang
Congyi Li
FAtt
AAML
19
2
0
09 Sep 2023
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via
  Bifurcation Theory
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
Minhak Song
Chulhee Yun
26
9
1
09 Jul 2023
When and Why Momentum Accelerates SGD:An Empirical Study
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
10
10
0
15 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
25
13
0
07 Jun 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
21
17
0
24 May 2023
On progressive sharpening, flat minima and generalisation
On progressive sharpening, flat minima and generalisation
L. MacDonald
Jack Valmadre
Simon Lucey
14
3
0
24 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
29
15
0
21 May 2023
Loss Spike in Training Neural Networks
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
25
4
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
  Stability
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
24
16
0
19 May 2023
Learning Trajectories are Generalization Indicators
Learning Trajectories are Generalization Indicators
Jingwen Fu
Zhizheng Zhang
Dacheng Yin
Yan Lu
Nanning Zheng
AI4CE
20
3
0
25 Apr 2023
Why is parameter averaging beneficial in SGD? An objective smoothing
  perspective
Why is parameter averaging beneficial in SGD? An objective smoothing perspective
Atsushi Nitanda
Ryuhei Kikuchi
Shugo Maeda
Denny Wu
FedML
8
0
0
18 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
Learning threshold neurons via the "edge of stability"
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
Maximal Initial Learning Rates in Deep ReLU Networks
Gaurav M. Iyer
Boris Hanin
David Rolnick
8
9
0
14 Dec 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist
  Example
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
64
35
0
07 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
63
34
0
04 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
19
76
0
30 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
25
72
0
26 Aug 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not converge
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
9
11
0
16 Aug 2022
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
21
49
0
29 Jul 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
11
42
0
26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
22
69
0
14 Jun 2022
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and
  the Grokking Phenomenon
The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Vimal Thilak
Etai Littwin
Shuangfei Zhai
Omid Saremi
Roni Paiss
J. Susskind
23
61
0
10 Jun 2022
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of
  Stability in Training Neural Networks
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks
Yuxin Sun
Dong Lao
G. Sundaramoorthi
A. Yezzi
11
3
0
04 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
75
88
0
19 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
17
27
0
24 Apr 2022
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
153
232
0
04 Mar 2020
1