Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.12176
Cited By
On the Origin of Implicit Regularization in Stochastic Gradient Descent
28 January 2021
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the Origin of Implicit Regularization in Stochastic Gradient Descent"
50 / 137 papers shown
Title
On the Interaction of Noise, Compression Role, and Adaptivity under
(
L
0
,
L
1
)
(L_0, L_1)
(
L
0
,
L
1
)
-Smoothness: An SDE-based Approach
Enea Monzio Compagnoni
Rustem Islamov
Antonio Orvieto
Eduard A. Gorbunov
11
1
0
30 May 2025
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
Ildus Sadrtdinov
Ivan Klimov
E. Lobacheva
Dmitry Vetrov
22
0
0
29 May 2025
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Ioannis Bantzis
James B. Simon
Arthur Jacot
ODL
32
0
0
27 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
323
47
0
29 Apr 2025
Harnessing uncertainty when learning through Equilibrium Propagation in neural networks
Jonathan Peters
Philippe Talatchian
87
0
0
28 Mar 2025
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Runxi Cheng
Feng Xiong
Yongxian Wei
Wanyun Zhu
Chun Yuan
MoMe
133
1
0
11 Mar 2025
Convergence Analysis of Federated Learning Methods Using Backward Error Analysis
Jinwoo Lim
Suhyun Kim
Soo-Mook Moon
FedML
118
0
0
05 Mar 2025
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
69
0
0
27 Feb 2025
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
85
1
0
29 Oct 2024
Neuro-symbolic Learning Yielding Logical Constraints
Zenan Li
Yunpeng Huang
Zhaoyu Li
Yuan Yao
Jingwei Xu
Taolue Chen
Xiaoxing Ma
Jian Lu
NAI
97
6
0
28 Oct 2024
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
578
0
0
15 Oct 2024
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
Hai Huang
Randall Balestriero
53
0
0
13 Oct 2024
Preconditioning for Accelerated Gradient Descent Optimization and Regularization
Qiang Ye
AI4CE
48
0
0
30 Sep 2024
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
112
1
0
10 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
138
1
0
26 Aug 2024
Local vs Global continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
CLL
73
0
0
23 Jul 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
56
1
0
17 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
112
5
0
14 Jun 2024
When Will Gradient Regularization Be Harmful?
Yang Zhao
Hao Zhang
Xiuyuan Hu
AI4CE
65
1
0
14 Jun 2024
Unlocking Telemetry Potential: Self-Supervised Learning for Continuous Clinical Electrocardiogram Monitoring
Thomas Kite
Uzair Tahamid Siam
Brian Ayers
Nicholas Houstis
Aaron D Aguirre
79
1
0
07 Jun 2024
A Margin-based Multiclass Generalization Bound via Geometric Complexity
Michael Munn
Benoit Dherin
Javier Gonzalvo
UQCV
81
2
0
28 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
75
2
0
24 May 2024
Loss Jump During Loss Switch in Solving PDEs with Neural Networks
Zhiwei Wang
Lulu Zhang
Zhongwang Zhang
Z. Xu
58
0
0
06 May 2024
PETScML: Second-order solvers for training regression problems in Scientific Machine Learning
Stefano Zampini
Umberto Zerbinati
George Turkyyiah
David E. Keyes
62
5
0
18 Mar 2024
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
Junwei Su
Difan Zou
Chuan Wu
98
0
0
13 Mar 2024
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
187
25
0
04 Mar 2024
Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
H. V. Hasselt
Razvan Pascanu
James Martens
Will Dabney
AI4CE
128
38
0
29 Feb 2024
Towards Optimal Learning of Language Models
Yuxian Gu
Li Dong
Y. Hao
Qingxiu Dong
Minlie Huang
Furu Wei
99
7
0
27 Feb 2024
Corridor Geometry in Gradient-Based Optimization
Benoit Dherin
M. Rosca
59
1
0
13 Feb 2024
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
70
6
0
21 Jan 2024
Neglected Hessian component explains mysteries in Sharpness regularization
Yann N. Dauphin
Atish Agarwala
Hossein Mobahi
FAtt
113
7
0
19 Jan 2024
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
28
0
0
19 Nov 2023
A PAC-Bayesian Perspective on the Interpolating Information Criterion
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
96
2
0
13 Nov 2023
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias
Jiyoung Park
Ian Pelakh
Stephan Wojtowytsch
82
1
0
10 Nov 2023
Implicit biases in multitask and continual learning from a backward error analysis perspective
Benoit Dherin
104
3
0
01 Nov 2023
Implicit meta-learning may lead language models to trust more reliable sources
Dmitrii Krasheninnikov
Egor Krasheninnikov
Bruno Mlodozeniec
Tegan Maharaj
David M. Krueger
72
4
0
23 Oct 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
85
1
0
22 Oct 2023
Robot Fleet Learning via Policy Merging
Lirui Wang
Kaiqing Zhang
Allan Zhou
Max Simchowitz
Russ Tedrake
123
5
0
02 Oct 2023
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
Jing Zhu
Xiang Song
V. Ioannidis
Danai Koutra
Christos Faloutsos
174
15
0
25 Sep 2023
Regularization and Optimal Multiclass Learning
Julian Asilis
Siddartha Devic
S. Dughmi
Vatsal Sharan
S. Teng
56
8
0
24 Sep 2023
Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: Application to stochastic coordinate descent
Stefano Di Giovacchino
D. Higham
K. Zygalakis
48
1
0
05 Sep 2023
On the Implicit Bias of Adam
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
82
18
0
31 Aug 2023
Persistent learning signals and working memory without continuous attractors
Il Memming Park
Ábel Ságodi
Piotr Sokól
93
9
0
24 Aug 2023
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
99
8
0
18 Aug 2023
Learning-Rate-Free Learning: Dissecting D-Adaptation and Probabilistic Line Search
Max McGuinness
ODL
42
0
0
06 Aug 2023
Can Neural Network Memorization Be Localized?
Pratyush Maini
Michael C. Mozer
Hanie Sedghi
Zachary Chase Lipton
J. Zico Kolter
Chiyuan Zhang
TDI
69
55
0
18 Jul 2023
Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training
Yechao Zhang
Shengshan Hu
Leo Yu Zhang
Junyu Shi
Minghui Li
Xiaogeng Liu
Wei Wan
Hai Jin
AAML
132
24
0
15 Jul 2023
The Interpolating Information Criterion for Overparameterized Models
Liam Hodgkinson
Christopher van der Heide
Roberto Salomone
Fred Roosta
Michael W. Mahoney
72
9
0
15 Jul 2023
Implicit regularisation in stochastic gradient descent: from single-objective to two-player games
Mihaela Rosca
M. Deisenroth
58
2
0
11 Jul 2023
Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness
C. Hartmann
Lorenz Richter
AAML
52
2
0
05 Jul 2023
1
2
3
Next