Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.09769
Cited By
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
19 February 2021
Shahar Azulay
E. Moroshko
Mor Shpigel Nacson
Blake E. Woodworth
Nathan Srebro
Amir Globerson
Daniel Soudry
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent"
50 / 53 papers shown
Title
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Tom Jacobs
Chao Zhou
R. Burkholz
OffRL
AI4CE
23
0
0
17 Apr 2025
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
46
0
0
20 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
33
0
0
28 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
66
0
0
21 Dec 2024
Slowing Down Forgetting in Continual Learning
Pascal Janetzky
Tobias Schlagenhauf
Stefan Feuerriegel
CLL
24
0
0
11 Nov 2024
A Mirror Descent Perspective of Smoothed Sign Descent
Shuyang Wang
Diego Klabjan
31
0
0
18 Oct 2024
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom
Sangyoon Lee
Jaeho Lee
48
2
0
07 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
30
3
0
22 Sep 2024
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
Nadav Cohen
Noam Razin
24
0
0
25 Aug 2024
Implicit Bias of Mirror Flow on Separable Data
Scott Pesme
Radu-Alexandru Dragomir
Nicolas Flammarion
29
1
0
18 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
35
15
0
10 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
28
10
0
13 Mar 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
13
0
0
25 Nov 2023
How connectivity structure shapes rich and lazy learning in neural circuits
Yuhan Helena Liu
A. Baratin
Jonathan H. Cornford
Stefan Mihalas
E. Shea-Brown
Guillaume Lajoie
30
14
0
12 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
12
3
0
01 Oct 2023
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics
Yehonatan Avidan
Qianyi Li
H. Sompolinsky
52
8
0
08 Sep 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
21
3
0
06 Aug 2023
Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks
J. S. Wind
Vegard Antun
A. Hansen
8
4
0
13 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
17
16
0
30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
10
12
0
30 Jun 2023
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
11
172
0
16 Jun 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
D. Chistikov
Matthias Englert
R. Lazic
MLT
32
12
0
10 Jun 2023
Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Dan Zhao
14
5
0
01 Jun 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
17
35
0
02 Apr 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
21
16
0
17 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
11
2
0
01 Feb 2023
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
24
49
0
25 Oct 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets
Edo Cohen-Karlik
Itamar Menuhin-Gruman
Raja Giryes
Nadav Cohen
Amir Globerson
13
4
0
25 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
11
4
0
13 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
63
34
0
04 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
8
8
0
19 Sep 2022
Incremental Learning in Diagonal Linear Networks
Raphael Berthier
CLL
AI4CE
18
16
0
31 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
22
72
0
26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
22
27
0
08 Jul 2022
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
17
31
0
06 Jul 2022
Reconstructing Training Data from Trained Neural Networks
Niv Haim
Gal Vardi
Gilad Yehudai
Ohad Shamir
Michal Irani
16
130
0
15 Jun 2022
Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding
Tianyang Hu
Zhili Liu
Fengwei Zhou
Wenjia Wang
Weiran Huang
SSL
31
26
0
30 May 2022
Smooth over-parameterized solvers for non-smooth structured optimization
C. Poon
Gabriel Peyré
6
18
0
03 May 2022
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent
Yiling Luo
X. Huo
Y. Mei
15
1
0
29 Apr 2022
Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
9
0
0
11 Feb 2022
Implicit Regularization Towards Rank Minimization in ReLU Networks
Nadav Timor
Gal Vardi
Ohad Shamir
21
49
0
30 Jan 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
28
29
0
27 Jan 2022
More is Less: Inducing Sparsity via Overparameterization
H. Chou
J. Maly
Holger Rauhut
22
25
0
21 Dec 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
Foolish Crowds Support Benign Overfitting
Niladri S. Chatterji
Philip M. Long
83
20
0
06 Oct 2021
On Margin Maximization in Linear and ReLU Networks
Gal Vardi
Ohad Shamir
Nathan Srebro
40
27
0
06 Oct 2021
Continuous vs. Discrete Optimization of Deep Neural Networks
Omer Elkabetz
Nadav Cohen
58
44
0
14 Jul 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf
Alon Brutzkus
Amir Globerson
14
17
0
04 Jul 2021
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
4
97
0
17 Jun 2021
1
2
Next