Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.12678
Cited By
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
26 July 2022
Z. Li
Zixuan Wang
Jian Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability"
37 / 37 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
26
0
0
05 May 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
20
0
0
05 Apr 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
74
0
0
04 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
47
4
0
17 Feb 2025
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
19
1
0
21 Oct 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
32
2
0
12 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
20
0
0
31 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
26
5
0
22 May 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
35
1
0
05 Feb 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
18
2
0
16 Jan 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
16
0
0
25 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
25
10
0
07 Nov 2023
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
24
10
0
26 Oct 2023
Irreducible Curriculum for Language Model Pretraining
Simin Fan
Martin Jaggi
22
9
0
23 Oct 2023
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Kenneth Allen
Hoang-Phi Nguyen
Tung Pham
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
Trung Le
AAML
19
2
0
29 Sep 2023
Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Networks
Jinqiu Guo
Ting Gao
Yufu Lan
Peng Zhang
Sikun Yang
Jinqiao Duan
16
0
0
12 Jul 2023
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
Minhak Song
Chulhee Yun
26
9
1
09 Jul 2023
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Khashayar Gatmiry
Zhiyuan Li
Ching-Yao Chuang
Sashank J. Reddi
Tengyu Ma
Stefanie Jegelka
ODL
17
11
0
22 Jun 2023
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
12
10
0
15 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
30
13
0
07 Jun 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
17
0
0
29 May 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
21
17
0
24 May 2023
On progressive sharpening, flat minima and generalisation
L. MacDonald
Jack Valmadre
Simon Lucey
19
4
0
24 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
28
4
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
24
16
0
19 May 2023
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Dayal Singh Kalra
M. Barkeshli
13
9
0
23 Feb 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
Gaurav M. Iyer
Boris Hanin
David Rolnick
21
9
0
14 Dec 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Dinh Q. Phung
Trung Le
19
10
0
24 Nov 2022
Second-order regression models exhibit progressive sharpening to the edge of stability
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
25
26
0
10 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
64
35
0
07 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
22
76
0
30 Sep 2022
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks
Yuxin Sun
Dong Lao
G. Sundaramoorthi
A. Yezzi
19
3
0
04 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
75
89
0
19 May 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
156
233
0
04 Mar 2020
1