v1v2 (latest)

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

International Conference on Learning Representations (ICLR), 2020

17 December 2020

Papers citing "Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning"

50 / 112 papers shown

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

Connall Garrod

Jonathan P. Keating

Christos Thrampoulidis

205

03 Dec 2025

The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning

Milad Aghajohari

Kamran Chitsaz

Amirhossein Kazemnejad

305

08 Oct 2025

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

151

01 Oct 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

288

29 Sep 2025

Diagonal Linear Networks and the Lasso Regularization Path

Raphaël Berthier

173

23 Sep 2025

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

732

12 Jun 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

537

06 Jun 2025

Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis

296

03 Jun 2025

The Rich and the Simple: On the Implicit Bias of Adam and SGD

323

29 May 2025

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

372

27 May 2025

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

391

27 May 2025

Mirror, Mirror of the Flow: How Does Regularization Shape Implicit Bias?

406

17 Apr 2025

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

453

11 Apr 2025

An Overview of Low-Rank Structures in the Training and Adaptation of Large Models

309

25 Mar 2025

Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)

602

28 Feb 2025

Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture

Yikun Hou

Suvrit Sra

A. Yurtsever

378

27 Jan 2025

Weight decay induces low-rank attention layersNeural Information Processing Systems (NeurIPS), 2024

Seijin Kobayashi

Yassir Akram

J. Oswald

303

31 Oct 2024

The Persistence of Neural Collapse Despite Low-Rank Bias

Connall Garrod

Jonathan P. Keating

346

30 Oct 2024

Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional TokensPhysical Review X (PRX), 2024

522

24 Oct 2024

Swing-by Dynamics in Concept Learning and Compositional GeneralizationInternational Conference on Learning Representations (ICLR), 2024

413

10 Oct 2024

Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024

269

03 Oct 2024

From Lazy to Rich: Exact Learning Dynamics in Deep Linear NetworksInternational Conference on Learning Representations (ICLR), 2024

484

22 Sep 2024

Improving Adaptivity via Over-Parameterization in Sequence ModelsNeural Information Processing Systems (NeurIPS), 2024

Yicheng Li

Qian Lin

311

02 Sep 2024

Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning

Nadav Cohen

Noam Razin

295

25 Aug 2024

Approaching Deep Learning through the Spectral Dynamics of Weights

385

21 Aug 2024

A Generalization Bound for Nearly-Linear Networks

Eugene Golikov

309

09 Jul 2024

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

394

08 Jul 2024

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Joshua Susskind

347

03 Jul 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024

Feng Chen

375

10 Jun 2024

Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Can Yaras

Peng Wang

Laura Balzano

Qing Qu

AI4CE

332

06 Jun 2024

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Zhenfeng Tu

Santiago Aranguri

Arthur Jacot

252

27 May 2024

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

Arthur Jacot

Alexandre Kaiser

SSL

350

27 May 2024

Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target

Jiajie Zhao

Zhiwei Bai

Yaoyu Zhang

292

22 May 2024

Deep linear networks for regression are implicitly regularized towards flat minima

Pierre Marion

Lénaic Chizat

ODL

343

22 May 2024

Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

435

22 May 2024

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

Heejune Sheen

Siyu Chen

Tianhao Wang

Harrison H. Zhou

MLT

272

13 Mar 2024

Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

Akshay Kumar

Jarvis Haupt

ODL

424

12 Mar 2024

Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024

509

11 Mar 2024

The Expected Loss of Preconditioned Langevin Dynamics Reveals the Hessian Rank

225

21 Feb 2024

Average gradient outer product as a mechanism for deep neural collapse

478

21 Feb 2024

Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature LearningInternational Conference on Machine Learning (ICML), 2024

Yuxiao Wen

Arthur Jacot

467

12 Feb 2024

Implicit Bias and Fast Convergence Rates for Self-attention

Bhavya Vasudeva

Puneesh Deora

Christos Thrampoulidis

523

08 Feb 2024

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce GrokkingInternational Conference on Learning Representations (ICLR), 2023

398

30 Nov 2023

Applying statistical learning theory to deep learningJournal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023

295

26 Nov 2023

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

Soo Min Kwon

Zekai Zhang

Dogyoon Song

Laura Balzano

Qing Qu

339

08 Nov 2023

A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023

346

22 Oct 2023

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Jake Mendel

179

10 Oct 2023

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and InitializationInternational Conference on Learning Representations (ICLR), 2023

Nuoya Xiong

Lijun Ding

Simon S. Du

542

03 Oct 2023

Implicit regularization of deep residual networks towards neural ODEsInternational Conference on Learning Representations (ICLR), 2023

499

03 Sep 2023

Six Lectures on Linearized Neural NetworksJournal of Statistical Mechanics: Theory and Experiment (J. Stat. Mech.), 2023

Theodor Misiakiewicz

Andrea Montanari

385

25 Aug 2023