v1v2 (latest)

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

International Conference on Learning Representations (ICLR), 2024

22 September 2024

Papers citing "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks"

50 / 66 papers shown

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

Connall Garrod

Jonathan P. Keating

Christos Thrampoulidis

133

03 Dec 2025

Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles

Yizhou Zhang

Lun Du

132

02 Dec 2025

A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics

Yizhou Zhang

128

11 Nov 2025

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

125

09 Nov 2025

Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking

Yuandong Tian

193

25 Sep 2025

Intrinsic training dynamics of deep neural networks

10 Aug 2025

Feature learning is decoupled from generalization in high capacity neural networks

Niclas Goring

Charles London

Abdurrahman Hadi Erturk

258

25 Jul 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

417

06 Jun 2025

Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch

384

17 Apr 2025

On the Cone Effect in the Learning Dynamics

408

20 Mar 2025

A Theory of Initialisation's Impact on SpecialisationInternational Conference on Learning Representations (ICLR), 2025

Stefano Sarao Mannelli

CLL

296

04 Mar 2025

Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)

542

28 Feb 2025

Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

Chaitanya Kapoor

Sudhanshu Srivastava

Meenakshi Khosla

376

26 Feb 2025

Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer

Blake Bordelon

Cengiz Pehlevan

AI4CE

706

04 Feb 2025

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024

Feng Chen

335

10 Jun 2024

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Zhenfeng Tu

Santiago Aranguri

Arthur Jacot

219

27 May 2024

Asymptotics of feature learning in two-layer networks after one gradient-step

Lenka Zdeborová

303

07 Feb 2024

How connectivity structure shapes rich and lazy learning in neural circuitsInternational Conference on Learning Representations (ICLR), 2023

408

12 Oct 2023

Neural Feature Learning in Function SpaceJournal of machine learning research (JMLR), 2023

Xiangxiang Xu

Lizhong Zheng

286

18 Sep 2023

Abide by the Law and Follow the Flow: Conservation Laws for Gradient FlowsNeural Information Processing Systems (NeurIPS), 2023

Sibylle Marcotte

Rémi Gribonval

Gabriel Peyré

325

30 Jun 2023

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learningInternational Conference on Machine Learning (ICML), 2023

Libin Zhu

Chaoyue Liu

Adityanarayanan Radhakrishnan

M. Belkin

412

07 Jun 2023

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2022

D. Kunin

Atsushi Yamamura

Chao Ma

Surya Ganguli

162

07 Oct 2022

Relative representations enable zero-shot latent space communicationInternational Conference on Learning Representations (ICLR), 2022

Valentino Maiorca

Francesco Locatello

326

152

30 Sep 2022

Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

Sebastian Lee

Stefano Sarao Mannelli

316

18 May 2022

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the RepresentationNeural Information Processing Systems (NeurIPS), 2022

Jimmy Ba

248

117

03 May 2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Xiaodong Liu

353

221

07 Mar 2022

Exact Solutions of a Deep Linear NetworkNeural Information Processing Systems (NeurIPS), 2022

591

10 Feb 2022

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

324

30 Jun 2021

LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021

OffRL AI4TS AI4CE ALM AIMat

1.5K

15,273

17 Jun 2021

Probing transfer learning with a model of synthetic correlated datasets

Federica Gerace

Luca Saglietti

Stefano Sarao Mannelli

Andrew M. Saxe

Lenka Zdeborová

OOD

181

09 Jun 2021

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror DescentInternational Conference on Machine Learning (ICML), 2021

256

19 Feb 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank LearningInternational Conference on Learning Representations (ICLR), 2020

Zhiyuan Li

Yuping Luo

Kaifeng Lyu

245

143

17 Dec 2020

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

D. Kunin

Javier Sagastuy-Breña

Surya Ganguli

Daniel L. K. Yamins

Hidenori Tanaka

341

08 Dec 2020

Feature Learning in Infinite-Width Neural Networks

Greg Yang

J. E. Hu

MLT

402

180

30 Nov 2020

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent KernelNeural Information Processing Systems (NeurIPS), 2020

Stanislav Fort

Gintare Karolina Dziugaite

298

219

28 Oct 2020

Phase diagram for two-layer ReLU neural networks at infinite-width limitJournal of machine learning research (JMLR), 2020

202

15 Jul 2020

The large learning rate phase of deep learning: the catapult mechanism

Aitor Lewkowycz

Yasaman Bahri

Ethan Dyer

Jascha Narain Sohl-Dickstein

Guy Gur-Ari

ODL

511

260

04 Mar 2020

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic LossAnnual Conference Computational Learning Theory (COLT), 2020

Lénaïc Chizat

Francis R. Bach

MLT

615

364

11 Feb 2020

The Implicit Bias of Depth: How Incremental Learning Drives GeneralizationInternational Conference on Learning Representations (ICLR), 2019

250

26 Sep 2019

Kernel and Rich Regimes in Overparametrized ModelsAnnual Conference Computational Learning Theory (COLT), 2019

372

393

13 Jun 2019

Implicit Regularization in Deep Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2019

392

559

31 May 2019

Similarity of Neural Network Representations RevisitedInternational Conference on Machine Learning (ICML), 2019

1.2K

1,743

01 May 2019

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural NetworksNeural Information Processing Systems (NeurIPS), 2019

185

168

30 Apr 2019

On Exact Computation with an Infinitely Wide Neural Net

633

991

26 Apr 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Jascha Narain Sohl-Dickstein

Jeffrey Pennington

639

1,214

18 Feb 2019

Width Provably Matters in Optimization for Deep Linear Neural Networks

S. Du

Wei Hu

386

101

24 Jan 2019

On Lazy Training in Differentiable Programming

Lénaïc Chizat

Edouard Oyallon

Francis R. Bach

538

909

19 Dec 2018

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

822

815

12 Nov 2018

A Convergence Theory for Deep Learning via Over-ParameterizationInternational Conference on Machine Learning (ICML), 2018

1.4K

1,555

09 Nov 2018

Gradient Descent Finds Global Minima of Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2018

1.2K

1,190

09 Nov 2018