ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.09769
  4. Cited By
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal
  Mirror Descent

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

19 February 2021
Shahar Azulay
E. Moroshko
Mor Shpigel Nacson
Blake E. Woodworth
Nathan Srebro
Amir Globerson
Daniel Soudry
    AI4CE
ArXivPDFHTML

Papers citing "On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent"

50 / 53 papers shown
Title
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Tom Jacobs
Chao Zhou
R. Burkholz
OffRL
AI4CE
23
0
0
17 Apr 2025
On the Cone Effect in the Learning Dynamics
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
46
0
0
20 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
33
0
0
28 Feb 2025
Optimization Insights into Deep Diagonal Linear Networks
Optimization Insights into Deep Diagonal Linear Networks
Hippolyte Labarrière
C. Molinari
Lorenzo Rosasco
S. Villa
Cristian Vega
66
0
0
21 Dec 2024
Slowing Down Forgetting in Continual Learning
Slowing Down Forgetting in Continual Learning
Pascal Janetzky
Tobias Schlagenhauf
Stefan Feuerriegel
CLL
24
0
0
11 Nov 2024
A Mirror Descent Perspective of Smoothed Sign Descent
A Mirror Descent Perspective of Smoothed Sign Descent
Shuyang Wang
Diego Klabjan
31
0
0
18 Oct 2024
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom
Sangyoon Lee
Jaeho Lee
48
2
0
07 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
30
3
0
22 Sep 2024
Lecture Notes on Linear Neural Networks: A Tale of Optimization and
  Generalization in Deep Learning
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
Nadav Cohen
Noam Razin
24
0
0
25 Aug 2024
Implicit Bias of Mirror Flow on Separable Data
Implicit Bias of Mirror Flow on Separable Data
Scott Pesme
Radu-Alexandru Dragomir
Nicolas Flammarion
29
1
0
18 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations
  promote rapid feature learning
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
35
15
0
10 Jun 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
28
10
0
13 Mar 2024
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large
  Catapults
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
Prin Phunyaphibarn
Junghyun Lee
Bohan Wang
Huishuai Zhang
Chulhee Yun
13
0
0
25 Nov 2023
How connectivity structure shapes rich and lazy learning in neural
  circuits
How connectivity structure shapes rich and lazy learning in neural circuits
Yuhan Helena Liu
A. Baratin
Jonathan H. Cornford
Stefan Mihalas
E. Shea-Brown
Guillaume Lajoie
30
14
0
12 Oct 2023
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang
Lei Wu
12
3
0
01 Oct 2023
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics
Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics
Yehonatan Avidan
Qianyi Li
H. Sompolinsky
52
8
0
08 Sep 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity,
  Sharpness, and Feature Learning
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
21
3
0
06 Aug 2023
Implicit regularization in AI meets generalized hardness of
  approximation in optimization -- Sharp results for diagonal linear networks
Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks
J. S. Wind
Vegard Antun
A. Hansen
8
4
0
13 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient
  Flows
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
17
16
0
30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
  Networks
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
10
12
0
30 Jun 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-Context
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
11
172
0
16 Jun 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias
  for Correlated Inputs
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
D. Chistikov
Matthias Englert
R. Lazic
MLT
32
12
0
10 Jun 2023
Combining Explicit and Implicit Regularization for Efficient Learning in
  Deep Networks
Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
Dan Zhao
14
5
0
01 Jun 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
23
13
0
22 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Scott Pesme
Nicolas Flammarion
17
35
0
02 Apr 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large
  Stepsizes and Edge of Stability
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
21
16
0
17 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear
  Regression
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
11
2
0
01 Feb 2023
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for
  Language Models
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
24
49
0
25 Oct 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets
Edo Cohen-Karlik
Itamar Menuhin-Gruman
Raja Giryes
Nadav Cohen
Amir Globerson
13
4
0
25 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic
  Gradient Descent
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
11
4
0
13 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
63
34
0
04 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
8
8
0
19 Sep 2022
Incremental Learning in Diagonal Linear Networks
Incremental Learning in Diagonal Linear Networks
Raphael Berthier
CLL
AI4CE
18
16
0
31 Aug 2022
On the Implicit Bias in Deep-Learning Algorithms
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
22
72
0
26 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
22
27
0
08 Jul 2022
The alignment property of SGD noise and how it helps select flat minima:
  A stability analysis
The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu
Mingze Wang
Weijie Su
MLT
17
31
0
06 Jul 2022
Reconstructing Training Data from Trained Neural Networks
Reconstructing Training Data from Trained Neural Networks
Niv Haim
Gal Vardi
Gilad Yehudai
Ohad Shamir
Michal Irani
16
130
0
15 Jun 2022
Your Contrastive Learning Is Secretly Doing Stochastic Neighbor
  Embedding
Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding
Tianyang Hu
Zhili Liu
Fengwei Zhou
Wenjia Wang
Weiran Huang
SSL
31
26
0
30 May 2022
Smooth over-parameterized solvers for non-smooth structured optimization
Smooth over-parameterized solvers for non-smooth structured optimization
C. Poon
Gabriel Peyré
6
18
0
03 May 2022
Implicit Regularization Properties of Variance Reduced Stochastic Mirror
  Descent
Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent
Yiling Luo
X. Huo
Y. Mei
15
1
0
29 Apr 2022
Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks
Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks
Sangmin Lee
Byeongsu Sim
Jong Chul Ye
MLT
9
0
0
11 Feb 2022
Implicit Regularization Towards Rank Minimization in ReLU Networks
Implicit Regularization Towards Rank Minimization in ReLU Networks
Nadav Timor
Gal Vardi
Ohad Shamir
21
49
0
30 Jan 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep
  Convolutional Neural Networks
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
28
29
0
27 Jan 2022
More is Less: Inducing Sparsity via Overparameterization
More is Less: Inducing Sparsity via Overparameterization
H. Chou
J. Maly
Holger Rauhut
22
25
0
21 Dec 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
83
98
0
13 Oct 2021
Foolish Crowds Support Benign Overfitting
Foolish Crowds Support Benign Overfitting
Niladri S. Chatterji
Philip M. Long
83
20
0
06 Oct 2021
On Margin Maximization in Linear and ReLU Networks
On Margin Maximization in Linear and ReLU Networks
Gal Vardi
Ohad Shamir
Nathan Srebro
40
27
0
06 Oct 2021
Continuous vs. Discrete Optimization of Deep Neural Networks
Continuous vs. Discrete Optimization of Deep Neural Networks
Omer Elkabetz
Nadav Cohen
58
44
0
14 Jul 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers
A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf
Alon Brutzkus
Amir Globerson
14
17
0
04 Jul 2021
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of
  Stochasticity
Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
Scott Pesme
Loucas Pillaud-Vivien
Nicolas Flammarion
4
97
0
17 Jun 2021
12
Next