Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.10034
Cited By
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
20 February 2023
Weihang Xu
S. Du
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron"
14 / 14 papers shown
Title
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
32
0
0
13 Nov 2024
The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest
Shen-Huan Lyu
Jin-Hui Wu
Qin-Cheng Zheng
Baoliu Ye
23
0
0
06 Jul 2024
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
Weihang Xu
Maryam Fazel
Simon S. Du
23
0
0
29 Jun 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data
Nikita Tsoy
Nikola Konstantinov
21
4
0
27 May 2024
Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target
Jiajie Zhao
Zhiwei Bai
Yaoyu Zhang
21
0
0
22 May 2024
Future Directions in the Theory of Graph Machine Learning
Christopher Morris
Fabrizio Frasca
Nadav Dym
Haggai Maron
.Ismail .Ilkan Ceylan
Ron Levie
Derek Lim
Michael M. Bronstein
Martin Grohe
Stefanie Jegelka
AI4CE
27
4
0
03 Feb 2024
Should Under-parameterized Student Networks Copy or Average Teacher Weights?
Berfin Simsek
Amire Bendjeddou
W. Gerstner
Johanni Brea
16
6
0
03 Nov 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
28
33
0
19 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong
Lijun Ding
Simon S. Du
11
11
0
03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
D. Chistikov
Matthias Englert
R. Lazic
MLT
27
12
0
10 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
12
70
0
25 May 2023
First-order ANIL learns linear representations despite misspecified latent dimension
Oğuz Kaan Yüksel
Etienne Boursier
Nicolas Flammarion
19
0
0
02 Mar 2023
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network
Mo Zhou
Rong Ge
Chi Jin
67
44
0
04 Feb 2021
1