ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.06914
  4. Cited By
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

13 October 2021
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
    MLT
ArXivPDFHTML

Papers citing "What Happens after SGD Reaches Zero Loss? --A Mathematical Framework"

20 / 20 papers shown
Title
Nesterov acceleration in benignly non-convex landscapes
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
24
2
0
10 Oct 2024
How Neural Networks Learn the Support is an Implicit Regularization
  Effect of SGD
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
21
1
0
17 Jun 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
41
4
1
25 May 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
37
6
0
12 Feb 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
19
0
0
02 Feb 2024
A Coefficient Makes SVRG Effective
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
15
1
0
09 Nov 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To
  Achieve Better Generalization
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
14
26
0
20 Jul 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
14
6
0
25 May 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample
  Complexity for Learning Single Index Models
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
Alexandru Damian
Eshaan Nichani
Rong Ge
Jason D. Lee
MLT
23
33
0
18 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
10
7
0
19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic
  Gradient Descent
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
14
6
0
14 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
14
6
0
28 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained
  Analysis of Matrix Sensing
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
20
34
0
27 Jan 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
10
47
0
10 Nov 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
17
2
0
24 Oct 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
17
72
0
26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
17
27
0
08 Jul 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurélien Lucchi
50
44
0
06 Feb 2022
First-order Methods Almost Always Avoid Saddle Points
First-order Methods Almost Always Avoid Saddle Points
J. Lee
Ioannis Panageas
Georgios Piliouras
Max Simchowitz
Michael I. Jordan
Benjamin Recht
ODL
63
82
0
20 Oct 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
1