Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

20 February 2023

Papers citing "Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron"

14 / 14 papers shown

Title
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence Berfin Simsek Amire Bendjeddou Daniel Hsu 32 0 0 13 Nov 2024
The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest Shen-Huan Lyu Jin-Hui Wu Qin-Cheng Zheng Baoliu Ye 23 0 0 06 Jul 2024
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models Weihang Xu Maryam Fazel Simon S. Du 23 0 0 29 Jun 2024
Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data Nikita Tsoy Nikola Konstantinov 21 4 0 27 May 2024
Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target Jiajie Zhao Zhiwei Bai Yaoyu Zhang 21 0 0 22 May 2024
Future Directions in the Theory of Graph Machine Learning Christopher Morris Fabrizio Frasca Nadav Dym Haggai Maron .Ismail .Ilkan Ceylan Ron Levie Derek Lim Michael M. Bronstein Martin Grohe Stefanie Jegelka AI4CE 27 4 0 03 Feb 2024
Should Under-parameterized Student Networks Copy or Average Teacher Weights? Berfin Simsek Amire Bendjeddou W. Gerstner Johanni Brea 16 6 0 03 Nov 2023
On the Optimization and Generalization of Multi-head Attention Puneesh Deora Rouzbeh Ghaderi Hossein Taheri Christos Thrampoulidis MLT 28 33 0 19 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization Nuoya Xiong Lijun Ding Simon S. Du 11 11 0 03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention Yuandong Tian Yiping Wang Zhenyu (Allen) Zhang Beidi Chen Simon S. Du 16 35 0 01 Oct 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs D. Chistikov Matthias Englert R. Lazic MLT 27 12 0 10 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Yuandong Tian Yiping Wang Beidi Chen S. Du MLT 12 70 0 25 May 2023
First-order ANIL learns linear representations despite misspecified latent dimension Oğuz Kaan Yüksel Etienne Boursier Nicolas Flammarion 19 0 0 02 Mar 2023
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network Mo Zhou Rong Ge Chi Jin 67 44 0 04 Feb 2021