Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12470
Cited By
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
24 February 2021
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)"
50 / 55 papers shown
Title
Some Optimizers are More Equal: Understanding the Role of Optimizers in Group Fairness
Mojtaba Kolahdouzi
Hatice Gunes
Ali Etemad
25
0
0
21 Apr 2025
Understanding the Generalization Error of Markov algorithms through Poissonization
Benjamin Dupuis
Maxime Haddouche
George Deligiannidis
Umut Simsekli
42
0
0
11 Feb 2025
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
18
2
0
31 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
72
8
0
29 Oct 2024
Noise-Aware Differentially Private Variational Inference
Talal Alrawajfeh
Joonas Jälkö
Antti Honkela
30
0
0
25 Oct 2024
Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron
Christian Schmid
James M. Murray
32
0
0
05 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
53
1
0
26 Aug 2024
Stochastic Differential Equations models for Least-Squares Stochastic Gradient Descent
Adrien Schertzer
Loucas Pillaud-Vivien
26
0
0
02 Jul 2024
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
29
12
0
26 Jun 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
27
1
0
17 Jun 2024
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Siyuan Yu
Wei Chen
H. V. Poor
24
0
0
17 Jun 2024
Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective
Xinhao Yao
Xiaolin Hu
Shenzhi Yang
Yong Liu
39
2
0
06 Jun 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Nimit Rana
32
0
0
02 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
24
6
0
21 Jan 2024
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
Markus Gross
A. Raulf
Christoph Räth
38
0
0
23 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
46
1
0
22 Oct 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
24
27
0
06 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
22
3
0
01 Oct 2023
On the different regimes of Stochastic Gradient Descent
Antonio Sclocchi
M. Wyart
23
17
0
19 Sep 2023
Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems
Maria Chiara Angelini
A. Cavaliere
Raffaele Marino
F. Ricci-Tersenghi
53
5
0
11 Sep 2023
REALM: Robust Entropy Adaptive Loss Minimization for Improved Single-Sample Test-Time Adaptation
Skyler Seto
B. Theobald
Federico Danieli
Navdeep Jaitly
Dan Busbridge
TTA
OOD
28
6
0
07 Sep 2023
Law of Balance and Stationary Distribution of Stochastic Gradient Descent
Liu Ziyin
Hongchao Li
Masakuni Ueda
28
9
0
13 Aug 2023
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
42
8
0
27 Jul 2023
How to Scale Your EMA
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
25
17
0
25 Jul 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
34
15
0
05 Jun 2023
Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution
Jinwuk Seok
Chang-Jae Cho
20
0
0
30 May 2023
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
22
177
0
27 May 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
Bettina Messmer
Martin Jaggi
22
11
0
26 May 2023
Tight conditions for when the NTK approximation is valid
Enric Boix-Adserà
Etai Littwin
30
0
0
22 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
13
7
0
19 Feb 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
Benjamin Gess
Sebastian Kassing
Vitalii Konarovskyi
DiffM
24
6
0
14 Feb 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurélien Lucchi
21
13
0
19 Jan 2023
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
23
6
0
05 Dec 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
19
51
0
24 Nov 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
10
10
0
19 Nov 2022
Global Convergence of SGD On Two Layer Neural Nets
Pulkit Gopalani
Anirbit Mukherjee
18
5
0
20 Oct 2022
A note on diffusion limits for stochastic gradient descent
Alberto Lanconelli
Christopher S. A. Lauria
DiffM
12
1
0
20 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks
Yizhou Liu
Weijie J. Su
Tongyang Li
16
17
0
29 Sep 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
22
31
0
06 Jul 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
21
71
0
08 Jun 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
Gerard Ben Arous
Reza Gheissari
Aukosh Jagannath
46
59
0
08 Jun 2022
Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
Jun Yu Li
Xu Luo
Jian Li
9
4
0
27 May 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
40
0
20 May 2022
Memory-Efficient Backpropagation through Large Linear Layers
Daniel Bershatsky
A. Mikhalev
A. Katrutsa
Julia Gusak
D. Merkulov
Ivan V. Oseledets
11
4
0
31 Jan 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
8
0
31 Jan 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
27
7
0
11 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
81
72
0
29 Sep 2021
SGD with a Constant Large Learning Rate Can Converge to Local Maxima
Liu Ziyin
Botao Li
James B. Simon
Masakuni Ueda
11
8
0
25 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
28
15
0
19 Jul 2021
1
2
Next