Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.11326
Cited By
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
24 April 2022
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes"
25 / 25 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
20
0
0
05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
47
4
0
17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
57
0
0
14 Oct 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
32
2
0
12 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
20
0
0
31 May 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Feiyu Xiong
Zhiyu Li
E. Weinan
Lei Wu
37
6
0
31 May 2024
Corridor Geometry in Gradient-Based Optimization
Benoit Dherin
M. Rosca
25
0
0
13 Feb 2024
Data-induced multiscale losses and efficient multirate gradient descent schemes
Juncai He
Liangchen Liu
Yen-Hsi Tsai
20
0
0
05 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
19
6
0
21 Jan 2024
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Mingze Wang
Zeping Min
Lei Wu
25
3
0
24 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
25
10
0
07 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
41
1
0
22 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
17
3
0
01 Oct 2023
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method
Mohannad Elhamod
Anuj Karpatne
10
1
0
26 Sep 2023
Sharpness-Aware Minimization and the Edge of Stability
Philip M. Long
Peter L. Bartlett
AAML
25
9
0
21 Sep 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
34
15
0
05 Jun 2023
Loss Spike in Training Neural Networks
Zhongwang Zhang
Z. Xu
28
4
0
20 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
24
16
0
19 May 2023
Training trajectories, mini-batch losses and the curious role of the learning rate
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Nolan Miller
ODL
11
10
0
05 Jan 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
30
72
0
26 Aug 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
35
69
0
14 Jun 2022
A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks
Yuxin Sun
Dong Lao
G. Sundaramoorthi
A. Yezzi
19
3
0
04 Jun 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
55
40
0
07 Oct 2021
1