Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.06914
Cited By
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
13 October 2021
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Happens after SGD Reaches Zero Loss? --A Mathematical Framework"
20 / 20 papers shown
Title
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
24
2
0
10 Oct 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
21
1
0
17 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
41
4
1
25 May 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
37
6
0
12 Feb 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
19
0
0
02 Feb 2024
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
15
1
0
09 Nov 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
14
26
0
20 Jul 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
14
6
0
25 May 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
Alexandru Damian
Eshaan Nichani
Rong Ge
Jason D. Lee
MLT
23
33
0
18 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
10
7
0
19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
14
6
0
14 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
14
6
0
28 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
20
34
0
27 Jan 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
10
47
0
10 Nov 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
17
2
0
24 Oct 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
17
72
0
26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
17
27
0
08 Jul 2022
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurélien Lucchi
50
44
0
06 Feb 2022
First-order Methods Almost Always Avoid Saddle Points
J. Lee
Ioannis Panageas
Georgios Piliouras
Max Simchowitz
Michael I. Jordan
Benjamin Recht
ODL
63
82
0
20 Oct 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
1