ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.06914
  4. Cited By
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
v1v2v3v4 (latest)

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

13 October 2021
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
    MLT
ArXiv (abs)PDFHTML

Papers citing "What Happens after SGD Reaches Zero Loss? --A Mathematical Framework"

35 / 35 papers shown
Title
Flat Channels to Infinity in Neural Loss Landscapes
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli
Alexander Van Meegen
Berfin Simsek
W. Gerstner
Johanni Brea
22
0
0
17 Jun 2025
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
31
0
0
03 Jun 2025
The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity
Justin Sahs
Ryan Pyle
Fabio Anselmi
Ankit B. Patel
117
0
0
13 Mar 2025
Slowing Down Forgetting in Continual Learning
Slowing Down Forgetting in Continual Learning
Pascal Janetzky
Tobias Schlagenhauf
Stefan Feuerriegel
CLL
132
0
0
11 Nov 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
163
3
0
14 Oct 2024
Nesterov acceleration in benignly non-convex landscapes
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
98
2
0
10 Oct 2024
How Neural Networks Learn the Support is an Implicit Regularization
  Effect of SGD
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
80
2
0
17 Jun 2024
Reparameterization invariance in approximate Bayesian inference
Reparameterization invariance in approximate Bayesian inference
Hrittik Roy
M. Miani
Carl Henrik Ek
Philipp Hennig
Marvin Pfortner
Lukas Tatzel
Søren Hauberg
BDL
140
11
0
05 Jun 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
183
8
1
25 May 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
160
8
0
12 Feb 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
86
1
0
02 Feb 2024
A Coefficient Makes SVRG Effective
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
107
1
0
09 Nov 2023
The Marginal Value of Momentum for Small Learning Rate SGD
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
100
10
0
27 Jul 2023
Max-Margin Token Selection in Attention Mechanism
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
129
45
0
23 Jun 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
73
7
0
19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic
  Gradient Descent
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
77
6
0
14 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
148
66
0
14 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Mårten Björkman
121
7
0
28 Jan 2023
How Does Sharpness-Aware Minimization Minimize Sharpness?
How Does Sharpness-Aware Minimization Minimize Sharpness?
Kaiyue Wen
Tengyu Ma
Zhiyuan Li
AAML
102
50
0
10 Nov 2022
Flatter, faster: scaling momentum for optimal speedup of SGD
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
130
5
0
28 Oct 2022
Toward Equation of Motion for Deep Neural Networks: Continuous-time
  Gradient Descent and Discretization Error Analysis
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
96
10
0
28 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
146
58
0
25 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
82
2
0
24 Oct 2022
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse Features
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
156
60
0
11 Oct 2022
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of
  Stability
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
Alexandru Damian
Eshaan Nichani
Jason D. Lee
112
88
0
30 Sep 2022
Deep Double Descent via Smooth Interpolation
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Mårten Björkman
Hossein Azizpour
201
11
0
21 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedMLAI4CE
122
84
0
26 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
102
52
0
26 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
121
30
0
08 Jul 2022
Label noise (stochastic) gradient descent implicitly solves the Lasso
  for quadratic parametrisation
Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation
Loucas Pillaud-Vivien
J. Reygner
Nicolas Flammarion
NoLa
91
34
0
20 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
143
76
0
14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural
  Network Loss Landscapes
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
111
30
0
24 Apr 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
138
50
0
06 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep
  Convolutional Neural Networks
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
136
29
0
27 Jan 2022
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate
  Schedule
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Nikhil Iyer
V. Thejas
Nipun Kwatra
Ramachandran Ramjee
Muthian Sivathanu
115
29
0
09 Mar 2020
1