Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.06914
Cited By
v1
v2
v3
v4 (latest)
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
13 October 2021
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"What Happens after SGD Reaches Zero Loss? --A Mathematical Framework"
35 / 35 papers shown
Title
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli
Alexander Van Meegen
Berfin Simsek
W. Gerstner
Johanni Brea
22
0
0
17 Jun 2025
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
31
0
0
03 Jun 2025
The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity
Justin Sahs
Ryan Pyle
Fabio Anselmi
Ankit B. Patel
117
0
0
13 Mar 2025
Slowing Down Forgetting in Continual Learning
Pascal Janetzky
Tobias Schlagenhauf
Stefan Feuerriegel
CLL
132
0
0
11 Nov 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
163
3
0
14 Oct 2024
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
98
2
0
10 Oct 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
80
2
0
17 Jun 2024
Reparameterization invariance in approximate Bayesian inference
Hrittik Roy
M. Miani
Carl Henrik Ek
Philipp Hennig
Marvin Pfortner
Lukas Tatzel
Søren Hauberg
BDL
140
11
0
05 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
183
8
1
25 May 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
160
8
0
12 Feb 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
86
1
0
02 Feb 2024
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
107
1
0
09 Nov 2023
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
100
10
0
27 Jul 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
129
45
0
23 Jun 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
73
7
0
19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
77
6
0
14 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
148
66
0
14 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Mårten Björkman
121
7
0
28 Jan 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
102
50
0
10 Nov 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
130
5
0
28 Oct 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
96
10
0
28 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
146
58
0
25 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
82
2
0
24 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
156
60
0
11 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
112
88
0
30 Sep 2022
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Mårten Björkman
Hossein Azizpour
201
11
0
21 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
122
84
0
26 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
102
52
0
26 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
121
30
0
08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
91
34
0
20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
143
76
0
14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
111
30
0
24 Apr 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
138
50
0
06 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
136
29
0
27 Jan 2022
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
115
29
0
09 Mar 2020
1