ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.12678
  4. Cited By
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability

Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

26 July 2022
Z. Li
Zixuan Wang
Jian Li
ArXivPDFHTML

Papers citing "Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability"

37 / 37 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
26
0
0
05 May 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
20
0
0
05 Apr 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
74
0
0
04 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
47
4
0
17 Feb 2025
Simplicity Bias via Global Convergence of Sharpness Minimization
Simplicity Bias via Global Convergence of Sharpness Minimization
Khashayar Gatmiry
Zhiyuan Li
Sashank J. Reddi
Stefanie Jegelka
19
1
0
21 Oct 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks:
  Margin Improvement and Fast Optimization
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
32
2
0
12 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian
  Alignment
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
20
0
0
31 May 2024
Deep linear networks for regression are implicitly regularized towards
  flat minima
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
26
5
0
22 May 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
35
1
0
05 Feb 2024
GD doesn't make the cut: Three ways that non-differentiability affects
  neural network training
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
18
2
0
16 Jan 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large
  Catapults
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
16
0
0
25 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network
  Optimization
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
25
10
0
07 Nov 2023
Good regularity creates large learning rate implicit biases: edge of
  stability, balancing, and catapult
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
24
10
0
26 Oct 2023
Irreducible Curriculum for Language Model Pretraining
Irreducible Curriculum for Language Model Pretraining
Simin Fan
Martin Jaggi
22
9
0
23 Oct 2023
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Kenneth Allen
Hoang-Phi Nguyen
Tung Pham
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
Trung Le
AAML
19
2
0
29 Sep 2023
Learning Stochastic Dynamical Systems as an Implicit Regularization with
  Graph Neural Networks
Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Networks
Jinqiu Guo
Ting Gao
Yufu Lan
Peng Zhang
Sikun Yang
Jinqiao Duan
13
0
0
12 Jul 2023
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via
  Bifurcation Theory
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
Minhak Song
Chulhee Yun
26
9
1
09 Jul 2023
The Inductive Bias of Flatness Regularization for Deep Matrix
  Factorization
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization
Khashayar Gatmiry
Zhiyuan Li
Ching-Yao Chuang
Sashank J. Reddi
Tengyu Ma
Stefanie Jegelka
ODL
17
11
0
22 Jun 2023
When and Why Momentum Accelerates SGD:An Empirical Study
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
12
10
0
15 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
30
13
0
07 Jun 2023
SANE: The phases of gradient descent through Sharpness Adjusted Number
  of Effective parameters
SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters
Lawrence Wang
Stephen J. Roberts
17
0
0
29 May 2023
The Crucial Role of Normalization in Sharpness-Aware Minimization
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
21
17
0
24 May 2023
On progressive sharpening, flat minima and generalisation
On progressive sharpening, flat minima and generalisation
L. MacDonald
Jack Valmadre
Simon Lucey
19
4
0
24 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Loss Spike in Training Neural Networks
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
28
4
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
  Stability
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
24
16
0
19 May 2023
Phase diagram of early training dynamics in deep neural networks: effect
  of the learning rate, depth, and width
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Dayal Singh Kalra
M. Barkeshli
13
9
0
23 Feb 2023
Learning threshold neurons via the "edge of stability"
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
Maximal Initial Learning Rates in Deep ReLU Networks
Maximal Initial Learning Rates in Deep ReLU Networks
Gaurav M. Iyer
Boris Hanin
David Rolnick
18
9
0
14 Dec 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Dinh Q. Phung
Trung Le
17
10
0
24 Nov 2022
Second-order regression models exhibit progressive sharpening to the
  edge of stability
Second-order regression models exhibit progressive sharpening to the edge of stability
Atish Agarwala
Fabian Pedregosa
Jeffrey Pennington
25
26
0
10 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist
  Example
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
64
35
0
07 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
22
76
0
30 Sep 2022
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of
  Stability in Training Neural Networks
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks
Yuxin Sun
Dong Lao
G. Sundaramoorthi
A. Yezzi
16
3
0
04 Jun 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
75
88
0
19 May 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
156
233
0
04 Mar 2020
1