ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.14623
  4. Cited By
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
v1v2 (latest)

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

International Conference on Learning Representations (ICLR), 2024
22 September 2024
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
ArXiv (abs)PDFHTML

Papers citing "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks"

50 / 66 papers shown
Title
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Connall Garrod
Jonathan P. Keating
Christos Thrampoulidis
88
0
0
03 Dec 2025
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Yizhou Zhang
Lun Du
112
0
0
02 Dec 2025
A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics
A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics
Yizhou Zhang
116
3
0
11 Nov 2025
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations
Amit Levi
Raz Lapid
Rom Himelstein
Yaniv Nemcovsky
Ravid Shwartz Ziv
A. Mendelson
MQ
117
1
0
09 Nov 2025
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
Yuandong Tian
188
0
0
25 Sep 2025
Intrinsic training dynamics of deep neural networks
Intrinsic training dynamics of deep neural networks
Sibylle Marcotte
Gabriel Peyré
Rémi Gribonval
AI4CE
96
1
0
10 Aug 2025
Feature learning is decoupled from generalization in high capacity neural networks
Feature learning is decoupled from generalization in high capacity neural networks
Niclas Goring
Charles London
Abdurrahman Hadi Erturk
Chris Mingard
Yoonsoo Nam
Ard A. Louis
OODMLT
252
1
0
25 Jul 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
401
4
0
06 Jun 2025
Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch
Sign-In to the Lottery: Reparameterizing Sparse Training From Scratch
Advait Gadhikar
Tom Jacobs
Chao Zhou
R. Burkholz
356
1
0
17 Apr 2025
On the Cone Effect in the Learning Dynamics
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
388
1
0
20 Mar 2025
A Theory of Initialisation's Impact on SpecialisationInternational Conference on Learning Representations (ICLR), 2025
Devon Jarvis
Sebastian Lee
Clémentine Dominé
Andrew M. Saxe
Stefano Sarao Mannelli
CLL
281
2
0
04 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
534
1
0
28 Feb 2025
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts
Chaitanya Kapoor
Sudhanshu Srivastava
Meenakshi Khosla
366
1
0
26 Feb 2025
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
Blake Bordelon
Cengiz Pehlevan
AI4CE
671
5
0
04 Feb 2025
Get rich quick: exact solutions reveal how unbalanced initializations
  promote rapid feature learning
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
323
25
0
10 Jun 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Zhenfeng Tu
Santiago Aranguri
Arthur Jacot
207
14
0
27 May 2024
Asymptotics of feature learning in two-layer networks after one
  gradient-step
Asymptotics of feature learning in two-layer networks after one gradient-step
Hugo Cui
Luca Pesce
Yatin Dandi
Florent Krzakala
Yue M. Lu
Lenka Zdeborová
Bruno Loureiro
MLT
286
23
0
07 Feb 2024
How connectivity structure shapes rich and lazy learning in neural
  circuits
How connectivity structure shapes rich and lazy learning in neural circuitsInternational Conference on Learning Representations (ICLR), 2023
Yuhan Helena Liu
A. Baratin
Jonathan H. Cornford
Stefan Mihalas
E. Shea-Brown
Guillaume Lajoie
392
22
0
12 Oct 2023
Neural Feature Learning in Function Space
Neural Feature Learning in Function SpaceJournal of machine learning research (JMLR), 2023
Xiangxiang Xu
Lizhong Zheng
270
16
0
18 Sep 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient
  Flows
Abide by the Law and Follow the Flow: Conservation Laws for Gradient FlowsNeural Information Processing Systems (NeurIPS), 2023
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
317
27
0
30 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learningInternational Conference on Machine Learning (ICML), 2023
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
412
24
0
07 Jun 2023
The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks
The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural NetworksInternational Conference on Learning Representations (ICLR), 2022
D. Kunin
Atsushi Yamamura
Chao Ma
Surya Ganguli
157
22
0
07 Oct 2022
Relative representations enable zero-shot latent space communication
Relative representations enable zero-shot latent space communicationInternational Conference on Learning Representations (ICLR), 2022
Luca Moschella
Valentino Maiorca
Marco Fumero
Antonio Norelli
Francesco Locatello
Emanuele Rodolà
314
152
0
30 Sep 2022
Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node
  Activation
Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation
Sebastian Lee
Stefano Sarao Mannelli
Claudia Clopath
Sebastian Goldt
Andrew M. Saxe
CLL
316
14
0
18 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the RepresentationNeural Information Processing Systems (NeurIPS), 2022
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
236
116
0
03 May 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
333
220
0
07 Mar 2022
Exact Solutions of a Deep Linear Network
Exact Solutions of a Deep Linear NetworkNeural Information Processing Systems (NeurIPS), 2022
Liu Ziyin
Botao Li
Xiangmin Meng
ODL
555
23
0
10 Feb 2022
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization
  Training, Symmetry, and Sparsity
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity
Arthur Jacot
François Ged
Berfin cSimcsek
Clément Hongler
Franck Gabriel
316
65
0
30 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
1.5K
15,183
0
17 Jun 2021
Probing transfer learning with a model of synthetic correlated datasets
Probing transfer learning with a model of synthetic correlated datasets
Federica Gerace
Luca Saglietti
Stefano Sarao Mannelli
Andrew M. Saxe
Lenka Zdeborová
OOD
173
37
0
09 Jun 2021
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal
  Mirror Descent
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror DescentInternational Conference on Machine Learning (ICML), 2021
Shahar Azulay
E. Moroshko
Mor Shpigel Nacson
Blake E. Woodworth
Nathan Srebro
Amir Globerson
Daniel Soudry
AI4CE
251
78
0
19 Feb 2021
Towards Resolving the Implicit Bias of Gradient Descent for Matrix
  Factorization: Greedy Low-Rank Learning
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank LearningInternational Conference on Learning Representations (ICLR), 2020
Zhiyuan Li
Yuping Luo
Kaifeng Lyu
213
143
0
17 Dec 2020
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
  Dynamics
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
341
89
0
08 Dec 2020
Feature Learning in Infinite-Width Neural Networks
Feature Learning in Infinite-Width Neural Networks
Greg Yang
J. E. Hu
MLT
384
180
0
30 Nov 2020
Deep learning versus kernel learning: an empirical study of loss
  landscape geometry and the time evolution of the Neural Tangent Kernel
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent KernelNeural Information Processing Systems (NeurIPS), 2020
Stanislav Fort
Gintare Karolina Dziugaite
Mansheej Paul
Sepideh Kharaghani
Daniel M. Roy
Surya Ganguli
293
219
0
28 Oct 2020
Phase diagram for two-layer ReLU neural networks at infinite-width limit
Phase diagram for two-layer ReLU neural networks at infinite-width limitJournal of machine learning research (JMLR), 2020
Yaoyu Zhang
Zhi-Qin John Xu
Zheng Ma
Yaoyu Zhang
202
71
0
15 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
487
260
0
04 Mar 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
  Trained with the Logistic Loss
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic LossAnnual Conference Computational Learning Theory (COLT), 2020
Lénaïc Chizat
Francis R. Bach
MLT
594
364
0
11 Feb 2020
The Implicit Bias of Depth: How Incremental Learning Drives
  Generalization
The Implicit Bias of Depth: How Incremental Learning Drives GeneralizationInternational Conference on Learning Representations (ICLR), 2019
Daniel Gissin
Shai Shalev-Shwartz
Amit Daniely
AI4CE
250
85
0
26 Sep 2019
Kernel and Rich Regimes in Overparametrized ModelsAnnual Conference Computational Learning Theory (COLT), 2019
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
335
391
0
13 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix FactorizationNeural Information Processing Systems (NeurIPS), 2019
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
356
559
0
31 May 2019
Similarity of Neural Network Representations Revisited
Similarity of Neural Network Representations RevisitedInternational Conference on Machine Learning (ICML), 2019
Simon Kornblith
Mohammad Norouzi
Honglak Lee
Geoffrey E. Hinton
1.2K
1,738
0
01 May 2019
Implicit Regularization of Discrete Gradient Dynamics in Linear Neural
  Networks
Implicit Regularization of Discrete Gradient Dynamics in Linear Neural NetworksNeural Information Processing Systems (NeurIPS), 2019
Gauthier Gidel
Francis R. Bach
Damien Scieur
AI4CE
185
168
0
30 Apr 2019
On Exact Computation with an Infinitely Wide Neural Net
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
605
989
0
26 Apr 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
582
1,210
0
18 Feb 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
358
101
0
24 Jan 2019
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
518
905
0
19 Dec 2018
Learning and Generalization in Overparameterized Neural Networks, Going
  Beyond Two Layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
759
813
0
12 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-ParameterizationInternational Conference on Machine Learning (ICML), 2018
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CEODL
1.4K
1,551
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2018
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
939
1,189
0
09 Nov 2018
12
Next