Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1906.08632
Cited By
v1
v2 (latest)
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup
Neural Information Processing Systems (NeurIPS), 2019
18 June 2019
Sebastian Goldt
Madhu S. Advani
Andrew M. Saxe
Florent Krzakala
Lenka Zdeborová
MLT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup"
50 / 108 papers shown
Title
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Guillaume Braun
Bruno Loureiro
Ha Quang Minh
Masaaki Imaizumi
48
0
0
24 Nov 2025
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Aukosh Jagannath
Taj Jones-McCormick
Varnan Sarangian
60
0
0
06 Nov 2025
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Parsa Rangriz
52
0
0
04 Nov 2025
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Nikos Tsikouras
Yorgos Pantis
Ioannis Mitliagkas
Christos Tzamos
BDL
134
0
0
22 Oct 2025
High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator
Yuma Ichikawa
Shuhei Kashiwamura
Ayaka Sakata
MQ
164
0
0
12 Oct 2025
Curl Descent: Non-Gradient Learning Dynamics with Sign-Diverse Plasticity
Hugo Ninou
Jonathan Kadmon
N. Alex Cayco-Gajic
160
0
0
03 Oct 2025
Sobolev acceleration for neural networks
Jong Kwon Oh
Hanbaek Lyu
Hwijae Son
135
1
0
24 Sep 2025
Why is Your Language Model a Poor Implicit Reward Model?
Noam Razin
Yong Lin
Jiarui Yao
Sanjeev Arora
LRM
160
0
0
10 Jul 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
289
3
0
06 Jun 2025
Models of Heavy-Tailed Mechanistic Universality
Liam Hodgkinson
Zhichao Wang
Michael W. Mahoney
221
3
0
04 Jun 2025
Analytic theory of dropout regularization
Physical Review E (Phys. Rev. E), 2025
Francesco Mori
Francesca Mignacco
241
1
0
12 May 2025
Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime
Annual Conference Computational Learning Theory (COLT), 2025
Francesco Camilli
D. Tieplova
Eleonora Bergamin
Jean Barbier
877
3
0
06 May 2025
A Computational Model of Inclusive Pedagogy: From Understanding to Application
Francesco Balzan
Pedro P. Santos
Maurizio Gabbrielli
Mahault Albarracin
Manuel Lopes
252
0
0
02 May 2025
Learning richness modulates equality reasoning in neural networks
William L. Tong
Cengiz Pehlevan
259
0
0
12 Mar 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
International Conference on Learning Representations (ICLR), 2025
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
304
2
0
08 Mar 2025
A Theory of Initialisation's Impact on Specialisation
International Conference on Learning Representations (ICLR), 2025
Devon Jarvis
Sebastian Lee
Clémentine Dominé
Andrew M. Saxe
Stefano Sarao Mannelli
CLL
237
2
0
04 Mar 2025
Nonlinear dynamics of localization in neural receptive fields
Neural Information Processing Systems (NeurIPS), 2025
Leon Lufkin
Andrew M. Saxe
Erin Grant
218
2
0
28 Jan 2025
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
337
4
0
13 Nov 2024
Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data
Irum Mehboob
Li Sun
Alireza Astegarpanah
Rustam Stolkin
UQCV
185
0
0
05 Nov 2024
A theoretical perspective on mode collapse in variational inference
Roman Soletskyi
Marylou Gabrié
Bruno Loureiro
DRL
129
7
0
17 Oct 2024
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
International Conference on Learning Representations (ICLR), 2024
Francesco Mori
Stefano Sarao Mannelli
Francesca Mignacco
443
7
0
26 Sep 2024
Symmetry & Critical Points
Yossi Arjevani
177
3
0
26 Aug 2024
Dynamics of Meta-learning Representation in the Teacher-student Scenario
Physical Review E (Phys. Rev. E), 2024
Hui Wang
Cho Tung Yip
Bo Li
240
0
0
22 Aug 2024
Towards understanding epoch-wise double descent in two-layer linear neural networks
Amanda Olmin
Fredrik Lindsten
MLT
209
4
0
13 Jul 2024
Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
Lineghuan Meng
Chuang Wang
145
1
0
11 Jun 2024
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
293
4
0
07 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
221
2
0
04 Jun 2024
The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms
Elizabeth Collins-Woodfin
Inbar Seroussi
Begona García Malaxechebarría
Andrew W. Mackenzie
Elliot Paquette
Courtney Paquette
131
2
0
30 May 2024
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
Anchit Jain
Rozhin Nobahari
A. Baratin
Stefano Sarao Mannelli
243
5
0
28 May 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im
Yixuan Li
ALM
355
18
0
27 Mar 2024
Low-Rank Learning by Design: the Role of Network Architecture and Activation Linearity in Gradient Rank Collapse
Bradley T. Baker
Ba Pearlmutter
Robyn L. Miller
Vince D. Calhoun
Sergey Plis
AI4CE
187
3
0
09 Feb 2024
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
143
5
0
20 Dec 2023
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
Physical Review Research (Phys. Rev. Res.), 2023
Markus Gross
A. Raulf
Christoph Räth
400
0
0
23 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Neural Information Processing Systems (NeurIPS), 2023
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
276
10
0
03 Nov 2023
Meta-Learning Strategies through Value Maximization in Neural Networks
Rodrigo Carrasco-Davis
Javier Masís
Andrew M. Saxe
153
2
0
30 Oct 2023
Learning Dynamics in Linear VAE: Posterior Collapse Threshold, Superfluous Latent Space Pitfalls, and Speedup with KL Annealing
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Yuma Ichikawa
Koji Hukushima
146
10
0
24 Oct 2023
How a student becomes a teacher: learning and forgetting through Spectral methods
Neural Information Processing Systems (NeurIPS), 2023
Lorenzo Giambagli
L. Buffoni
Lorenzo Chicchi
Duccio Fanelli
137
7
0
19 Oct 2023
On the different regimes of Stochastic Gradient Descent
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2023
Antonio Sclocchi
Matthieu Wyart
303
28
0
19 Sep 2023
Max-affine regression via first-order methods
SIAM Journal on Mathematics of Data Science (SIMODS), 2023
Seonho Kim
Kiryung Lee
111
3
0
15 Aug 2023
Fundamental limits of overparametrized shallow neural networks for supervised learning
Francesco Camilli
D. Tieplova
Jean Barbier
169
11
0
11 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Physical Review X (PRX), 2023
Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Adrew Saxe
OffRL
355
6
0
17 Jun 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
Neural Information Processing Systems (NeurIPS), 2023
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
364
37
0
07 Jun 2023
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
Luca Arnaboldi
Florent Krzakala
Bruno Loureiro
Ludovic Stephan
MLT
224
9
0
29 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
242
16
0
10 May 2023
Expand-and-Cluster: Parameter Recovery of Neural Networks
International Conference on Machine Learning (ICML), 2023
Flavio Martinelli
Berfin Simsek
W. Gerstner
Johanni Brea
416
13
0
25 Apr 2023
Leveraging the two timescale regime to demonstrate convergence of neural networks
Neural Information Processing Systems (NeurIPS), 2023
Pierre Marion
Raphael Berthier
215
9
0
19 Apr 2023
Mapping of attention mechanisms to a generalized Potts model
Physical Review Research (Phys. Rev. Res.), 2023
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
313
30
0
14 Apr 2023
Online Learning for the Random Feature Model in the Student-Teacher Framework
Roman Worschech
B. Rosenow
265
0
0
24 Mar 2023
Identifying Equivalent Training Dynamics
Neural Information Processing Systems (NeurIPS), 2023
William T. Redman
J. M. Bello-Rivas
M. Fonoberova
Ryan Mohr
Ioannis G. Kevrekidis
Igor Mezić
252
8
0
17 Feb 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks
Annual Conference Computational Learning Theory (COLT), 2023
Luca Arnaboldi
Ludovic Stephan
Florent Krzakala
Bruno Loureiro
MLT
153
39
0
12 Feb 2023
1
2
3
Next