ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.15933
  4. Cited By
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization
  Training, Symmetry, and Sparsity
v1v2 (latest)

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

30 June 2021
Arthur Jacot
François Ged
Berfin cSimcsek
Clément Hongler
Franck Gabriel
ArXiv (abs)PDFHTML

Papers citing "Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity"

48 / 48 papers shown
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Connall Garrod
Jonathan P. Keating
Christos Thrampoulidis
138
0
0
03 Dec 2025
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Hancheng Min
Zhihui Zhu
Rene Vidal
159
0
0
24 Oct 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
417
4
0
06 Jun 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
430
5
0
23 May 2025
On the Cone Effect in the Learning Dynamics
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
409
1
0
20 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
551
1
0
28 Feb 2025
Low-rank bias, weight decay, and model merging in neural networks
Low-rank bias, weight decay, and model merging in neural networks
Ilja Kuzborskij
Yasin Abbasi-Yadkori
351
1
0
24 Feb 2025
The Persistence of Neural Collapse Despite Low-Rank Bias
The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod
Jonathan P. Keating
301
5
0
30 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
The Optimization Landscape of SGD Across the Feature Learning StrengthInternational Conference on Learning Representations (ICLR), 2024
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
412
10
0
06 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
From Lazy to Rich: Exact Learning Dynamics in Deep Linear NetworksInternational Conference on Learning Representations (ICLR), 2024
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
379
22
0
22 Sep 2024
On the Minimal Degree Bias in Generalization on the Unseen for
  non-Boolean Functions
On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions
Denys Pushkin
Raphael Berthier
Emmanuel Abbe
206
0
0
10 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations
  promote rapid feature learning
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
338
25
0
10 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch
  size, and Time/Complexity Tradeoffs
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
259
2
0
04 Jun 2024
Synchronization on circles and spheres with nonlinear interactions
Synchronization on circles and spheres with nonlinear interactions
Christopher Criscitiello
Quentin Rebjock
Andrew D. McRae
Nicolas Boumal
286
12
0
28 May 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Zhenfeng Tu
Santiago Aranguri
Arthur Jacot
219
14
0
27 May 2024
Can Implicit Bias Imply Adversarial Robustness?
Can Implicit Bias Imply Adversarial Robustness?
Hancheng Min
Rene Vidal
298
3
0
24 May 2024
Deep linear networks for regression are implicitly regularized towards
  flat minima
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
300
13
0
22 May 2024
Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Zhiwei Bai
Jiajie Zhao
Yaoyu Zhang
AI4CE
344
2
0
22 May 2024
Compressed Meta-Optical Encoder for Image Classification
Compressed Meta-Optical Encoder for Image Classification
A. Wirth-Singh
Jinlin Xiang
Minho Choi
Johannes E. Froch
Luocheng Huang
S. Colburn
Eli Shlizerman
Arka Majumdar
191
12
0
23 Apr 2024
Sliding down the stairs: how correlated latent variables accelerate
  learning with neural networks
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
272
9
0
12 Apr 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations
Akshay Kumar
Jarvis Haupt
ODL
357
5
0
12 Mar 2024
Learning Associative Memories with Gradient Descent
Learning Associative Memories with Gradient Descent
Vivien A. Cabannes
Berfin Simsek
A. Bietti
223
13
0
28 Feb 2024
Compression of Structured Data with Autoencoders: Provable Benefit of
  Nonlinearities and Depth
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth
Kevin Kögler
Aleksandr Shevchenko
Hamed Hassani
Marco Mondelli
MLT
233
1
0
07 Feb 2024
Estimating the Local Learning Coefficient at Scale
Estimating the Local Learning Coefficient at Scale
Zach Furman
Edmund Lau
211
4
0
06 Feb 2024
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Understanding Unimodal Bias in Multimodal Deep Linear NetworksInternational Conference on Machine Learning (ICML), 2023
Yedi Zhang
Peter E. Latham
Andrew Saxe
272
15
0
01 Dec 2023
Efficient Compression of Overparameterized Deep Models through
  Low-Dimensional Learning Dynamics
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics
Soo Min Kwon
Zekai Zhang
Dogyoon Song
Laura Balzano
Qing Qu
289
4
0
08 Nov 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of
  Superposition
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
130
22
0
10 Oct 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with
  near-Optimal Sample Complexity: A Case Study in the XOR problem
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problemInternational Conference on Learning Representations (ICLR), 2023
Margalit Glasgow
MLT
385
21
0
26 Sep 2023
On the different regimes of Stochastic Gradient Descent
On the different regimes of Stochastic Gradient DescentProceedings of the National Academy of Sciences of the United States of America (PNAS), 2023
Antonio Sclocchi
Matthieu Wyart
374
31
0
19 Sep 2023
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space
Neural Hilbert Ladders: Multi-Layer Neural Networks in Function SpaceJournal of machine learning research (JMLR), 2023
Zhengdao Chen
352
3
0
03 Jul 2023
InRank: Incremental Low-Rank Learning
InRank: Incremental Low-Rank Learning
Jiawei Zhao
Yifei Zhang
Beidi Chen
F. Schafer
Anima Anandkumar
313
10
0
20 Jun 2023
Transformers learn through gradual rank increase
Transformers learn through gradual rank increaseNeural Information Processing Systems (NeurIPS), 2023
Enric Boix-Adserà
Etai Littwin
Emmanuel Abbe
Samy Bengio
J. Susskind
318
46
0
12 Jun 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias
  for Correlated Inputs
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated InputsNeural Information Processing Systems (NeurIPS), 2023
D. Chistikov
Matthias Englert
R. Lazic
MLT
257
15
0
10 Jun 2023
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
456
43
0
29 May 2023
Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps
  from high to low rank
Implicit bias of SGD in L2L_{2}L2​-regularized linear DNNs: One-way jumps from high to low rankInternational Conference on Learning Representations (ICLR), 2023
Zihan Wang
Arthur Jacot
273
23
0
25 May 2023
The star-shaped space of solutions of the spherical negative perceptron
The star-shaped space of solutions of the spherical negative perceptron
B. Annesi
Clarissa Lauditi
Carlo Lucibello
Enrico M. Malatesta
Gabriele Perugini
Fabrizio Pittorino
Luca Saglietti
159
18
0
18 May 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks
Saddle-to-Saddle Dynamics in Diagonal Linear NetworksNeural Information Processing Systems (NeurIPS), 2023
Scott Pesme
Nicolas Flammarion
403
46
0
02 Apr 2023
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised LearningInternational Conference on Machine Learning (ICML), 2023
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
278
40
0
27 Mar 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient
  Descent
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
270
8
0
23 Mar 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Generalization on the Unseen, Logic Reasoning and Degree CurriculumInternational Conference on Machine Learning (ICML), 2023
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
400
62
0
30 Jan 2023
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
  Factorization
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
JIAN-PENG Cao
Chao Qian
Yihui Huang
Dicheng Chen
Yuncheng Gao
Jiyang Dong
D. Guo
X. Qu
361
1
0
29 Dec 2022
Infinite-width limit of deep linear neural networks
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
182
23
0
29 Nov 2022
SGD with Large Step Sizes Learns Sparse Features
SGD with Large Step Sizes Learns Sparse FeaturesInternational Conference on Machine Learning (ICML), 2022
Maksym Andriushchenko
Aditya Varre
Loucas Pillaud-Vivien
Nicolas Flammarion
388
69
0
11 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear
  Functions
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear FunctionsInternational Conference on Learning Representations (ICLR), 2022
Arthur Jacot
363
36
0
29 Sep 2022
Incremental Learning in Diagonal Linear Networks
Incremental Learning in Diagonal Linear NetworksJournal of machine learning research (JMLR), 2022
Raphael Berthier
CLLAI4CE
247
22
0
31 Aug 2022
Gradient flow dynamics of shallow ReLU networks for square loss and
  orthogonal inputs
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputsNeural Information Processing Systems (NeurIPS), 2022
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
281
74
0
02 Jun 2022
Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and
  Sparsity
Feature Learning in L2L_{2}L2​-regularized DNNs: Attraction/Repulsion and SparsityNeural Information Processing Systems (NeurIPS), 2022
Arthur Jacot
Eugene Golikov
Clément Hongler
Franck Gabriel
MLT
305
19
0
31 May 2022
ZerO Initialization: Initializing Neural Networks with only Zeros and
  Ones
ZerO Initialization: Initializing Neural Networks with only Zeros and Ones
Jiawei Zhao
Florian Schäfer
Anima Anandkumar
266
36
0
25 Oct 2021
1