ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.10409
  4. Cited By
Online stochastic gradient descent on non-convex losses from
  high-dimensional inference
v1v2v3v4 (latest)

Online stochastic gradient descent on non-convex losses from high-dimensional inference

Journal of machine learning research (JMLR), 2020
23 March 2020
Gerard Ben Arous
Reza Gheissari
Aukosh Jagannath
ArXiv (abs)PDFHTML

Papers citing "Online stochastic gradient descent on non-convex losses from high-dimensional inference"

50 / 77 papers shown
Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics
Implicitly Normalized Online PCA: A Regularized Algorithm with Exact High-Dimensional Dynamics
Samet Demir
Zafer Dogan
137
0
0
01 Dec 2025
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
Parsa Rangriz
123
1
0
04 Nov 2025
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
Konstantinos Christopher Tsiolis
Alireza Mousavi-Hosseini
Murat A. Erdogdu
MLT
159
0
0
23 Oct 2025
Statistical Inference for Linear Functionals of Online Least-squares SGD when $t \gtrsim d^{1+δ}$
Statistical Inference for Linear Functionals of Online Least-squares SGD when t≳d1+δt \gtrsim d^{1+δ}t≳d1+δ
Bhavya Agrawalla
Krishnakumar Balasubramanian
Promit Ghosal
142
0
0
22 Oct 2025
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Nikos Tsikouras
Yorgos Pantis
Ioannis Mitliagkas
Christos Tzamos
BDL
212
0
0
22 Oct 2025
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning
Junsoo Oh
Wei Huang
Taiji Suzuki
263
1
0
14 Oct 2025
Learning Multi-Index Models with Hyper-Kernel Ridge Regression
Learning Multi-Index Models with Hyper-Kernel Ridge Regression
Shuo Huang
Hippolyte Labarrière
Ernesto De Vito
T. Poggio
Lorenzo Rosasco
135
1
0
02 Oct 2025
Test time training enhances in-context learning of nonlinear functions
Test time training enhances in-context learning of nonlinear functions
Kento Kuwataka
Taiji Suzuki
220
2
0
30 Sep 2025
Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws
Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws
Fabrizio Boncoraglio
Vittorio Erba
Emanuele Troiani
Florent Krzakala
Lenka Zdeborová
Lenka Zdeborová
230
0
0
29 Sep 2025
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Jonas Hübotter
Patrik Wolf
Alexander Shevchenko
Dennis Jüni
Andreas Krause
Gil Kur
343
2
0
29 Sep 2025
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
O. Duranthon
P. Marion
C. Boyer
B. Loureiro
L. Zdeborová
274
3
0
26 Sep 2025
In-Context Learning can Perform Continual Learning Like Humans
In-Context Learning can Perform Continual Learning Like Humans
Liuwang Kang
Fan Wang
Shaoshan Liu
Hung-Chyun Chou
Chuan Lin
Ning Ding
CLLKELM
183
0
0
26 Sep 2025
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Enric Boix-Adserà
Neil Rohit Mallinar
James B. Simon
M. Belkin
MLT
399
2
0
08 Jul 2025
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel
Yilan Chen
Zhichao Wang
Wei Huang
Andi Han
Taiji Suzuki
Arya Mazumdar
MLT
305
1
0
12 Jun 2025
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models
The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models
Alex Damian
Jason D. Lee
Joan Bruna
241
6
0
05 Jun 2025
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
Luca Arnaboldi
Bruno Loureiro
Ludovic Stephan
Florent Krzakala
Lenka Zdeborová
212
7
0
03 Jun 2025
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective
Behrad Moniri
Hamed Hassani
303
3
0
23 May 2025
Online Learning of Neural Networks
Online Learning of Neural Networks
Amit Daniely
Idan Mehalel
Elchanan Mossel
MLT
410
6
0
14 May 2025
Statistically guided deep learning
Statistically guided deep learning
Michael Kohler
A. Krzyżak
ODLBDL
412
0
0
11 Apr 2025
Survey on Algorithms for multi-index models
Survey on Algorithms for multi-index modelsStatistical Science (Stat. Sci.), 2025
Joan Bruna
Daniel Hsu
406
12
0
07 Apr 2025
Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient Descent
Learning a Single Index Model from Anisotropic Data with vanilla Stochastic Gradient DescentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Guillaume Braun
Minh Ha Quang
Masaaki Imaizumi
MLT
301
3
0
31 Mar 2025
Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions
Fabiola Ricci
Lorenzo Bardone
Sebastian Goldt
OOD
497
5
0
31 Mar 2025
Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Taj Jones-McCormick
Aukosh Jagannath
S. Sen
504
2
0
24 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
532
20
0
17 Feb 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Low-dimensional Functions are Efficiently Learnable under Randomly Biased DistributionsAnnual Conference Computational Learning Theory (COLT), 2025
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
464
6
0
10 Feb 2025
Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak Recovery
Spectral Estimators for Multi-Index Models: Precise Asymptotics and Optimal Weak RecoveryAnnual Conference Computational Learning Theory (COLT), 2025
Filip Kovačević
Yihan Zhang
Marco Mondelli
514
6
0
03 Feb 2025
Gradient dynamics for low-rank fine-tuning beyond kernels
Gradient dynamics for low-rank fine-tuning beyond kernels
Arif Kerem Dayi
Sitan Chen
311
2
0
23 Nov 2024
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional ConvergenceInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
431
6
0
13 Nov 2024
Sample and Computationally Efficient Robust Learning of Gaussian
  Single-Index Models
Sample and Computationally Efficient Robust Learning of Gaussian Single-Index ModelsNeural Information Processing Systems (NeurIPS), 2024
Puqian Wang
Nikos Zarifis
Ilias Diakonikolas
Jelena Diakonikolas
308
2
0
08 Nov 2024
Pretrained transformer efficiently learns low-dimensional target
  functions in-context
Pretrained transformer efficiently learns low-dimensional target functions in-contextNeural Information Processing Systems (NeurIPS), 2024
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
321
28
0
04 Nov 2024
A Random Matrix Theory Perspective on the Spectrum of Learned Features
  and Asymptotic Generalization Capabilities
A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization CapabilitiesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yatin Dandi
Luca Pesce
Hugo Cui
Florent Krzakala
Yue M. Lu
Bruno Loureiro
MLT
391
11
0
24 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions
Robust Feature Learning for Multi-Index Models in High DimensionsInternational Conference on Learning Representations (ICLR), 2024
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OODAAML
571
5
0
21 Oct 2024
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin DynamicsInternational Conference on Learning Representations (ICLR), 2024
Alireza Mousavi-Hosseini
Denny Wu
Murat A. Erdogdu
MLTAI4CE
438
12
0
14 Aug 2024
On the Complexity of Learning Sparse Functions with Statistical and
  Gradient Queries
On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
Nirmit Joshi
Theodor Misiakiewicz
Nathan Srebro
269
11
0
08 Jul 2024
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
464
4
0
07 Jun 2024
Online Learning and Information Exponents: On The Importance of Batch
  size, and Time/Complexity Tradeoffs
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
339
2
0
04 Jun 2024
Neural network learns low-dimensional polynomials with SGD near the
  information-theoretic limit
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
Jason D. Lee
Kazusato Oko
Taiji Suzuki
Denny Wu
MLT
447
39
0
03 Jun 2024
Learning from Streaming Data when Users Choose
Learning from Streaming Data when Users Choose
Jinyan Su
Sarah Dean
FedML
274
1
0
03 Jun 2024
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Luca Pesce
Ludovic Stephan
480
29
0
24 May 2024
Sliding down the stairs: how correlated latent variables accelerate
  learning with neural networks
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
329
13
0
12 Apr 2024
The Role of the Time-Dependent Hessian in High-Dimensional Optimization
The Role of the Time-Dependent Hessian in High-Dimensional Optimization
Tony Bonnaire
Giulio Biroli
C. Cammarota
526
0
0
04 Mar 2024
Provably learning a multi-head attention layer
Provably learning a multi-head attention layer
Sitan Chen
Yuanzhi Li
MLT
343
23
0
06 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer
  Networks: Breaking the Curse of Information and Leap Exponents
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap ExponentsInternational Conference on Machine Learning (ICML), 2024
Yatin Dandi
Emanuele Troiani
Luca Arnaboldi
Luca Pesce
Lenka Zdeborová
Florent Krzakala
MLT
381
41
0
05 Feb 2024
The Local Landscape of Phase Retrieval Under Limited Samples
The Local Landscape of Phase Retrieval Under Limited SamplesIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2023
Kaizhao Liu
Zihao Wang
Lei Wu
298
3
0
26 Nov 2023
Learning Hierarchical Polynomials with Three-Layer Neural Networks
Learning Hierarchical Polynomials with Three-Layer Neural Networks
Zihao Wang
Eshaan Nichani
Jason D. Lee
197
7
0
23 Nov 2023
Should Under-parameterized Student Networks Copy or Average Teacher
  Weights?
Should Under-parameterized Student Networks Copy or Average Teacher Weights?Neural Information Processing Systems (NeurIPS), 2023
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
376
10
0
03 Nov 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Grokking as the Transition from Lazy to Rich Training DynamicsInternational Conference on Learning Representations (ICLR), 2023
Tanishq Kumar
Blake Bordelon
Samuel Gershman
Cengiz Pehlevan
420
86
0
09 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks
Grokking as a First Order Phase Transition in Two Layer NetworksInternational Conference on Learning Representations (ICLR), 2023
Noa Rubin
Inbar Seroussi
Zohar Ringel
348
41
0
05 Oct 2023
Symmetric Single Index Learning
Symmetric Single Index LearningInternational Conference on Learning Representations (ICLR), 2023
Aaron Zweig
Joan Bruna
MLT
273
4
0
03 Oct 2023
Beyond Labeling Oracles: What does it mean to steal ML models?
Beyond Labeling Oracles: What does it mean to steal ML models?
Avital Shafran
Ilia Shumailov
Murat A. Erdogdu
Nicolas Papernot
AAML
428
5
0
03 Oct 2023
12
Next
Page 1 of 2