Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.09839
Cited By
v1
v2 (latest)
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
International Conference on Learning Representations (ICLR), 2020
17 December 2020
Zhiyuan Li
Yuping Luo
Kaifeng Lyu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning"
50 / 112 papers shown
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Connall Garrod
Jonathan P. Keating
Christos Thrampoulidis
183
0
0
03 Dec 2025
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Milad Aghajohari
Kamran Chitsaz
Amirhossein Kazemnejad
Sarath Chandar
Alessandro Sordoni
Aaron Courville
Siva Reddy
OffRL
ReLM
LRM
273
0
0
08 Oct 2025
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
Yudong Wei
Liang Zhang
Bingcong Li
Niao He
137
1
0
01 Oct 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
267
1
0
29 Sep 2025
Diagonal Linear Networks and the Lasso Regularization Path
Raphaël Berthier
140
1
0
23 Sep 2025
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Yixiao Huang
Hanlin Zhu
Tianyu Guo
Jiantao Jiao
Somayeh Sojoudi
Michael I. Jordan
Stuart Russell
Song Mei
LRM
687
6
0
12 Jun 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
456
6
0
06 Jun 2025
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
260
0
0
03 Jun 2025
The Rich and the Simple: On the Implicit Bias of Adam and SGD
Bhavya Vasudeva
Jung Whan Lee
Willie Neiswanger
Mahdi Soltanolkotabi
273
5
0
29 May 2025
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Nurbek Tastan
Stefanos Laskaridis
Martin Takáč
Karthik Nandakumar
Samuel Horváth
AI4CE
253
6
0
27 May 2025
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Ioannis Bantzis
James B. Simon
Arthur Jacot
ODL
369
2
0
27 May 2025
Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?
Tom Jacobs
Chao Zhou
R. Burkholz
OffRL
AI4CE
342
4
0
17 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OOD
MLT
426
0
0
11 Apr 2025
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
Laura Balzano
Tianjiao Ding
B. Haeffele
Soo Min Kwon
Qing Qu
Peng Wang
Liang Luo
Can Yaras
OffRL
AI4CE
270
4
0
25 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
567
2
0
28 Feb 2025
Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
Yikun Hou
Suvrit Sra
A. Yurtsever
356
0
0
27 Jan 2025
Weight decay induces low-rank attention layers
Neural Information Processing Systems (NeurIPS), 2024
Seijin Kobayashi
Yassir Akram
J. Oswald
281
29
0
31 Oct 2024
The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod
Jonathan P. Keating
316
10
0
30 Oct 2024
Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens
Physical Review X (PRX), 2024
Vittorio Erba
Emanuele Troiani
Luca Biggio
Antoine Maillard
Lenka Zdeborová
489
2
0
24 Oct 2024
Swing-by Dynamics in Concept Learning and Compositional Generalization
International Conference on Learning Representations (ICLR), 2024
Yongyi Yang
Core Francisco Park
Ekdeep Singh Lubana
Maya Okawa
Wei Hu
Hidenori Tanaka
CoGe
DiffM
358
0
0
10 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
International Conference on Learning Representations (ICLR), 2024
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
236
24
0
03 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
International Conference on Learning Representations (ICLR), 2024
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
403
31
0
22 Sep 2024
Improving Adaptivity via Over-Parameterization in Sequence Models
Neural Information Processing Systems (NeurIPS), 2024
Yicheng Li
Qian Lin
289
1
0
02 Sep 2024
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
Nadav Cohen
Noam Razin
283
2
0
25 Aug 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
335
15
0
21 Aug 2024
A Generalization Bound for Nearly-Linear Networks
Eugene Golikov
265
0
0
09 Jul 2024
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Arthur Jacot
Seok Hoan Choi
Yuxiao Wen
AI4CE
346
6
0
08 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
314
16
0
03 Jul 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Neural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
354
29
0
10 Jun 2024
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation
Can Yaras
Peng Wang
Laura Balzano
Qing Qu
AI4CE
301
26
0
06 Jun 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Zhenfeng Tu
Santiago Aranguri
Arthur Jacot
235
16
0
27 May 2024
Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets
Arthur Jacot
Alexandre Kaiser
309
1
0
27 May 2024
Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Jiajie Zhao
Zhiwei Bai
Yaoyu Zhang
242
1
0
22 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
323
16
0
22 May 2024
Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Zhiwei Bai
Jiajie Zhao
Yaoyu Zhang
AI4CE
375
2
0
22 May 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
252
15
0
13 Mar 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Akshay Kumar
Jarvis Haupt
ODL
377
6
0
12 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
International Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
478
2
0
11 Mar 2024
The Expected Loss of Preconditioned Langevin Dynamics Reveals the Hessian Rank
Amitay Bar
Rotem Mulayoff
T. Michaeli
Ronen Talmon
200
1
0
21 Feb 2024
Average gradient outer product as a mechanism for deep neural collapse
Daniel Beaglehole
Peter Súkeník
Marco Mondelli
Misha Belkin
AI4CE
414
19
0
21 Feb 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
International Conference on Machine Learning (ICML), 2024
Yuxiao Wen
Arthur Jacot
402
9
0
12 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
454
30
0
08 Feb 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
International Conference on Learning Representations (ICLR), 2023
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
359
59
0
30 Nov 2023
Applying statistical learning theory to deep learning
Journal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023
Cédric Gerbelot
Avetik G. Karagulyan
Stefani Karp
Kavya Ravichandran
Menachem Stern
Nathan Srebro
FedML
266
4
0
26 Nov 2023
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics
Soo Min Kwon
Zekai Zhang
Dogyoon Song
Laura Balzano
Qing Qu
317
4
0
08 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
International Conference on Learning Representations (ICLR), 2023
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
305
4
0
22 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
154
22
0
10 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
International Conference on Learning Representations (ICLR), 2023
Nuoya Xiong
Lijun Ding
Simon S. Du
494
21
0
03 Oct 2023
Implicit regularization of deep residual networks towards neural ODEs
International Conference on Learning Representations (ICLR), 2023
Pierre Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
470
21
0
03 Sep 2023
Six Lectures on Linearized Neural Networks
Journal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023
Theodor Misiakiewicz
Andrea Montanari
368
18
0
25 Aug 2023
1
2
3
Next
Page 1 of 3