Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.10409
Cited By
v1
v2
v3
v4 (latest)
Online stochastic gradient descent on non-convex losses from high-dimensional inference
Journal of machine learning research (JMLR), 2020
23 March 2020
Gerard Ben Arous
Reza Gheissari
Aukosh Jagannath
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Online stochastic gradient descent on non-convex losses from high-dimensional inference"
50 / 77 papers shown
Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics
Samet Demir
Zafer Dogan
137
0
0
01 Dec 2025
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Parsa Rangriz
123
1
0
04 Nov 2025
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
Konstantinos Christopher Tsiolis
Alireza Mousavi-Hosseini
Murat A. Erdogdu
MLT
159
0
0
23 Oct 2025
Statistical Inference for Linear Functionals of Online Least-squares SGD when
t
≳
d
1
+
δ
t \gtrsim d^{1+δ}
t
≳
d
1
+
δ
Bhavya Agrawalla
Krishnakumar Balasubramanian
Promit Ghosal
142
0
0
22 Oct 2025
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Nikos Tsikouras
Yorgos Pantis
Ioannis Mitliagkas
Christos Tzamos
BDL
212
0
0
22 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
263
1
0
14 Oct 2025
Learning Multi-Index Models with Hyper-Kernel Ridge Regression
Shuo Huang
Hippolyte Labarrière
Ernesto De Vito
T. Poggio
Lorenzo Rosasco
135
1
0
02 Oct 2025
Test time training enhances in-context learning of nonlinear functions
Kento Kuwataka
Taiji Suzuki
220
2
0
30 Sep 2025
Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws
Fabrizio Boncoraglio
Vittorio Erba
Emanuele Troiani
Florent Krzakala
Lenka Zdeborová
Lenka Zdeborová
230
0
0
29 Sep 2025
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Jonas Hübotter
Patrik Wolf
Alexander Shevchenko
Dennis Jüni
Andreas Krause
Gil Kur
343
2
0
29 Sep 2025
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
O. Duranthon
P. Marion
C. Boyer
B. Loureiro
L. Zdeborová
274
3
0
26 Sep 2025
In-Context Learning can Perform Continual Learning Like Humans
Liuwang Kang
Fan Wang
Shaoshan Liu
Hung-Chyun Chou
Chuan Lin
Ning Ding
CLL
KELM
183
0
0
26 Sep 2025
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Enric Boix-Adserà
Neil Rohit Mallinar
James B. Simon
M. Belkin
MLT
399
2
0
08 Jul 2025
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Yilan Chen
Zhichao Wang
Wei Huang
Andi Han
Taiji Suzuki
Arya Mazumdar
MLT
305
1
0
12 Jun 2025
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models
Alex Damian
Jason D. Lee
Joan Bruna
241
6
0
05 Jun 2025
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Luca Arnaboldi
Bruno Loureiro
Ludovic Stephan
Florent Krzakala
Lenka Zdeborová
212
7
0
03 Jun 2025
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
Behrad Moniri
Hamed Hassani
303
3
0
23 May 2025
Online Learning of Neural Networks
Amit Daniely
Idan Mehalel
Elchanan Mossel
MLT
410
6
0
14 May 2025
Statistically guided deep learning
Michael Kohler
A. Krzyżak
ODL
BDL
412
0
0
11 Apr 2025
Survey on Algorithms for multi-index models
Statistical Science (Stat. Sci.), 2025
Joan Bruna
Daniel Hsu
406
12
0
07 Apr 2025
Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent
International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Guillaume Braun
Minh Ha Quang
Masaaki Imaizumi
MLT
301
3
0
31 Mar 2025
Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
Fabiola Ricci
Lorenzo Bardone
Sebastian Goldt
OOD
497
5
0
31 Mar 2025
Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Taj Jones-McCormick
Aukosh Jagannath
S. Sen
504
2
0
24 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Neural Information Processing Systems (NeurIPS), 2024
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
532
20
0
17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Annual Conference Computational Learning Theory (COLT), 2025
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
464
6
0
10 Feb 2025
Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery
Annual Conference Computational Learning Theory (COLT), 2025
Filip Kovačević
Yihan Zhang
Marco Mondelli
514
6
0
03 Feb 2025
Gradient dynamics for low-rank fine-tuning beyond kernels
Arif Kerem Dayi
Sitan Chen
311
2
0
23 Nov 2024
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
431
6
0
13 Nov 2024
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models
Neural Information Processing Systems (NeurIPS), 2024
Puqian Wang
Nikos Zarifis
Ilias Diakonikolas
Jelena Diakonikolas
308
2
0
08 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context
Neural Information Processing Systems (NeurIPS), 2024
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
321
28
0
04 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yatin Dandi
Luca Pesce
Hugo Cui
Florent Krzakala
Yue M. Lu
Bruno Loureiro
MLT
391
11
0
24 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions
International Conference on Learning Representations (ICLR), 2024
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OOD
AAML
571
5
0
21 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
International Conference on Learning Representations (ICLR), 2024
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLT
AI4CE
438
12
0
14 Aug 2024
On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
Nirmit Joshi
Theodor Misiakiewicz
Nathan Srebro
269
11
0
08 Jul 2024
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
464
4
0
07 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
339
2
0
04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee
Kazusato Oko
Taiji Suzuki
Denny Wu
MLT
447
39
0
03 Jun 2024
Learning from Streaming Data when Users Choose
Jinyan Su
Sarah Dean
FedML
274
1
0
03 Jun 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Luca Pesce
Ludovic Stephan
480
29
0
24 May 2024
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
329
13
0
12 Apr 2024
The Role of the Time-Dependent Hessian in High-Dimensional Optimization
Tony Bonnaire
Giulio Biroli
C. Cammarota
526
0
0
04 Mar 2024
Provably learning a multi-head attention layer
Sitan Chen
Yuanzhi Li
MLT
343
23
0
06 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
International Conference on Machine Learning (ICML), 2024
Yatin Dandi
Emanuele Troiani
Luca Arnaboldi
Luca Pesce
Lenka Zdeborová
Florent Krzakala
MLT
381
41
0
05 Feb 2024
The Local Landscape of Phase Retrieval Under Limited Samples
IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2023
Kaizhao Liu
Zihao Wang
Lei Wu
298
3
0
26 Nov 2023
Learning Hierarchical Polynomials with Three-Layer Neural Networks
Zihao Wang
Eshaan Nichani
Jason D. Lee
197
7
0
23 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Neural Information Processing Systems (NeurIPS), 2023
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
376
10
0
03 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
International Conference on Learning Representations (ICLR), 2023
Tanishq Kumar
Blake Bordelon
Samuel Gershman
Cengiz Pehlevan
420
86
0
09 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks
International Conference on Learning Representations (ICLR), 2023
Noa Rubin
Inbar Seroussi
Zohar Ringel
348
41
0
05 Oct 2023
Symmetric Single Index Learning
International Conference on Learning Representations (ICLR), 2023
Aaron Zweig
Joan Bruna
MLT
273
4
0
03 Oct 2023
Beyond Labeling Oracles: What does it mean to steal ML models?
Avital Shafran
Ilia Shumailov
Murat A. Erdogdu
Nicolas Papernot
AAML
428
5
0
03 Oct 2023
1
2
Next
Page 1 of 2