Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.02054
Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent Provably Optimizes Over-parameterized Neural Networks"
50 / 244 papers shown
Title
Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime
Francesco Camilli
D. Tieplova
Eleonora Bergamin
Jean Barbier
97
0
0
06 May 2025
Deep learning with missing data
Tianyi Ma
Tengyao Wang
R. Samworth
59
0
0
21 Apr 2025
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
53
0
0
20 Mar 2025
Explainable Neural Networks with Guarantees: A Sparse Estimation Approach
Antoine Ledent
Peng Liu
FAtt
102
0
0
20 Feb 2025
MLPs at the EOC: Dynamics of Feature Learning
Dávid Terjék
MLT
41
0
0
18 Feb 2025
Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input
Ziang Chen
Rong Ge
MLT
59
1
0
10 Jan 2025
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li
Yuanzhi Li
OOD
28
2
0
11 Oct 2024
On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory
Guhan Chen
Yicheng Li
Qian Lin
AAML
32
1
0
08 Oct 2024
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
58
0
0
08 Oct 2024
SHAP values via sparse Fourier representation
Ali Gorji
Andisheh Amrollahi
A. Krause
FAtt
30
0
0
08 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
30
3
0
22 Sep 2024
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Arthur Jacot
Seok Hoan Choi
Yuxiao Wen
AI4CE
86
2
0
08 Jul 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
44
0
0
11 Jun 2024
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks
Zhifa Ke
Zaiwen Wen
Junyu Zhang
27
0
0
07 May 2024
On the Rashomon ratio of infinite hypothesis sets
Evzenie Coupkova
Mireille Boutin
29
1
0
27 Apr 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
31
1
0
12 Apr 2024
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
Yunlong Yu
CLL
34
3
0
19 Mar 2024
Merging Text Transformer Models from Different Initializations
Neha Verma
Maha Elbayad
MoMe
53
7
0
01 Mar 2024
Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks
Arnulf Jentzen
Adrian Riekert
33
4
0
07 Feb 2024
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
Ori Shem-Ur
Yaron Oz
14
0
0
08 Jan 2024
\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks
M. Stojnic
20
1
0
13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds
M. Stojnic
16
4
0
13 Dec 2023
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
A. D. Cunha
Francesco d’Amore
Emanuele Natale
MLT
19
1
0
16 Nov 2023
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
Behrad Moniri
Donghwan Lee
Hamed Hassani
Edgar Dobriban
MLT
32
19
0
11 Oct 2023
How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization
Nuoya Xiong
Lijun Ding
Simon S. Du
26
11
0
03 Oct 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
8
0
07 Sep 2023
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao-quan Song
Chiwun Yang
32
29
0
23 Aug 2023
Understanding Deep Neural Networks via Linear Separability of Hidden Layers
Chao Zhang
Xinyuan Chen
Wensheng Li
Lixue Liu
Wei Wu
Dacheng Tao
16
3
0
26 Jul 2023
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification
Lianke Qin
Zhao-quan Song
Yuanyuan Yang
20
9
0
13 Jul 2023
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
23
11
0
12 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Adrew Saxe
OffRL
20
3
0
17 Jun 2023
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
Puyu Wang
Yunwen Lei
Di Wang
Yiming Ying
Ding-Xuan Zhou
MLT
22
3
0
26 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
36
13
0
11 May 2023
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains
Yicheng Li
Zixiong Yu
Y. Cotronis
Qian Lin
53
13
0
04 May 2023
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
Luís Carvalho
Joao L. Costa
José Mourao
Gonccalo Oliveira
AI4CE
13
1
0
06 Apr 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks
Zheng Chen
Yuqing Li
Tao Luo
Zhaoguang Zhou
Z. Xu
MLT
AI4CE
41
8
0
12 Mar 2023
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation
Zhifa Ke
Junyu Zhang
Zaiwen Wen
19
0
0
25 Feb 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Weihang Xu
S. Du
26
16
0
20 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
François Caron
Fadhel Ayed
Paul Jung
Hoileong Lee
Juho Lee
Hongseok Yang
59
2
0
02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
27
2
0
01 Feb 2023
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
24
1
0
28 Jan 2023
A Simple Algorithm For Scaling Up Kernel Methods
Tengyu Xu
Bryan T. Kelly
Semyon Malamud
11
0
0
26 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
28
60
0
26 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
17
3
0
19 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
45
11
0
30 Dec 2022
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
Ilja Kuzborskij
Csaba Szepesvári
21
4
0
28 Dec 2022
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Md. Ismail Hossain
Mohammed Rakib
M. M. L. Elahi
Nabeel Mohammed
Shafin Rahman
21
1
0
24 Dec 2022
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
31
36
0
14 Dec 2022
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
30
3
0
08 Dec 2022
1
2
3
4
5
Next