Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1810.02281
Cited By
v1
v2
v3 (latest)
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
4 October 2018
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks"
50 / 209 papers shown
Title
Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent
Journal of Statistical Planning and Inference (JSPI), 2024
Michael Kohler
A. Krzyżak
Benjamin Walter
195
1
0
13 May 2024
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
204
14
0
29 Feb 2024
Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion
Shuqi Ke
Charlie Hou
Giulia Fanti
Sewoong Oh
187
5
0
29 Feb 2024
Sobolev Training for Operator Learning
Namkyeong Cho
Junseung Ryu
Hyung Ju Hwang
95
1
0
14 Feb 2024
Estimating the Local Learning Coefficient at Scale
Zach Furman
Edmund Lau
162
4
0
06 Feb 2024
Linear Recursive Feature Machines provably recover low-rank matrices
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Adityanarayanan Radhakrishnan
Misha Belkin
Dmitriy Drusvyatskiy
244
12
0
09 Jan 2024
Understanding Unimodal Bias in Multimodal Deep Linear Networks
International Conference on Machine Learning (ICML), 2023
Yedi Zhang
Peter E. Latham
Andrew Saxe
220
11
0
01 Dec 2023
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
IEEE Signal Processing Letters (IEEE SPL), 2023
Zhen Qin
Xuwei Tan
Zhihui Zhu
275
1
0
24 Nov 2023
Analysis of the expected
L
2
L_2
L
2
error of an over-parametrized deep neural network estimate learned by gradient descent without regularization
Selina Drews
Michael Kohler
153
4
0
24 Nov 2023
Fast-ELECTRA for Efficient Pre-training
International Conference on Learning Representations (ICLR), 2023
Chengyu Dong
Liyuan Liu
Hao Cheng
Jingbo Shang
Jianfeng Gao
Xiaodong Liu
188
2
0
11 Oct 2023
Are GATs Out of Balance?
Neural Information Processing Systems (NeurIPS), 2023
Nimrah Mustafa
Aleksandar Bojchevski
R. Burkholz
294
8
0
11 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
118
21
0
10 Oct 2023
Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion
International Conference on Learning Representations (ICLR), 2023
Alexandru Meterez
Amir Joudaki
Francesco Orabona
Alexander Immer
Gunnar Rätsch
Hadi Daneshmand
173
8
0
03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
International Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
266
45
0
01 Oct 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
International Conference on Learning Representations (ICLR), 2023
Hancheng Min
Enrique Mallada
René Vidal
MLT
218
26
0
24 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
Neural Information Processing Systems (NeurIPS), 2023
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
289
27
0
30 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Yu Gui
Cong Ma
Yiqiao Zhong
165
8
0
06 Jun 2023
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks
Can Yaras
Peng Wang
Wei Hu
Zhihui Zhu
Laura Balzano
Qing Qu
253
19
0
01 Jun 2023
Pruning at Initialization -- A Sketching Perspective
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Noga Bar
Raja Giryes
226
1
0
27 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Neural Information Processing Systems (NeurIPS), 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
368
96
0
25 May 2023
Implicit bias of SGD in
L
2
L_{2}
L
2
-regularized linear DNNs: One-way jumps from high to low rank
International Conference on Learning Representations (ICLR), 2023
Zihan Wang
Arthur Jacot
216
23
0
25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond
International Conference on Machine Learning (ICML), 2023
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
183
16
0
22 May 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss
International Conference on Machine Learning (ICML), 2023
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
338
7
0
06 Mar 2023
Bayesian Interpolation with Deep Linear Networks
Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2022
Boris Hanin
Alexander Zlokapa
333
28
0
29 Dec 2022
Effects of Data Geometry in Early Deep Learning
Neural Information Processing Systems (NeurIPS), 2022
Saket Tiwari
George Konidaris
297
7
0
29 Dec 2022
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
JIAN-PENG Cao
Chao Qian
Yihui Huang
Dicheng Chen
Yuncheng Gao
Jiyang Dong
D. Guo
X. Qu
301
1
0
29 Dec 2022
Asymptotic Analysis of Deep Residual Networks
R. Cont
Alain Rossier
Renyuan Xu
137
4
0
15 Dec 2022
Infinite-width limit of deep linear neural networks
Communications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
162
21
0
29 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases
M. Fornasier
T. Klock
Marco Mondelli
Michael Rauchensteiner
157
7
0
08 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
International Conference on Learning Representations (ICLR), 2022
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
308
26
0
31 Oct 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets
International Conference on Learning Representations (ICLR), 2022
Edo Cohen-Karlik
Itamar Menuhin-Gruman
Raja Giryes
Nadav Cohen
Amir Globerson
299
7
0
25 Oct 2022
Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit
SIAM Journal on Applied Dynamical Systems (SIADS), 2022
Nadav Cohen
Govind Menon
Zsolt Veraszto
ODL
155
11
0
22 Oct 2022
TiDAL: Learning Training Dynamics for Active Learning
IEEE International Conference on Computer Vision (ICCV), 2022
Seong Min Kye
Kwanghee Choi
Hyeongmin Byun
Buru Chang
335
20
0
13 Oct 2022
On skip connections and normalisation layers in deep optimisation
Neural Information Processing Systems (NeurIPS), 2022
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
337
4
0
10 Oct 2022
Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition
International Conference on Learning Representations (ICLR), 2022
Jianhao Ma
Li-Zhen Guo
Salar Fattahi
308
4
0
01 Oct 2022
Implicit Full Waveform Inversion with Deep Neural Representation
Jian Sun
K. Innanen
AI4CE
155
55
0
08 Sep 2022
Intersection of Parallels as an Early Stopping Criterion
International Conference on Information and Knowledge Management (CIKM), 2022
Ali Vardasbi
Maarten de Rijke
Mostafa Dehghani
MoMe
123
7
0
19 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
International Conference on Machine Learning (ICML), 2022
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
166
43
0
21 Jul 2022
A note on Linear Bottleneck networks and their Transition to Multilinearity
Libin Zhu
Parthe Pandit
M. Belkin
MLT
151
0
0
30 Jun 2022
Analysis of Branch Specialization and its Application in Image Decomposition
Jonathan Brokman
Guy Gilboa
98
2
0
12 Jun 2022
Explicit Regularization in Overparametrized Models via Noise Injection
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
124
33
0
09 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
331
101
0
08 Jun 2022
Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning
International Conference on Learning Representations (ICLR), 2022
Yuandong Tian
MLT
256
17
0
02 Jun 2022
Blind Estimation of a Doubly Selective OFDM Channel: A Deep Learning Algorithm and Theory
T. Getu
N. Golmie
D. Griffith
156
2
0
30 May 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks
Neural Information Processing Systems (NeurIPS), 2022
Blake Bordelon
Cengiz Pehlevan
MLT
263
108
0
19 May 2022
A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks
Information Sciences (Inf. Sci.), 2022
Xin Liu
Wei Tao
Zhisong Pan
80
11
0
18 Apr 2022
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks
Social Science Research Network (SSRN), 2022
R. Cont
Alain Rossier
Renyuan Xu
MLT
285
6
0
14 Apr 2022
Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and Discoveries
International Conference on Information and Knowledge Management (CIKM), 2022
Haekyu Park
Seongmin Lee
Benjamin Hoover
Austin P. Wright
Omar Shaikh
Rahul Duggal
Nilaksh Das
Kevin Wenliang Li
Judy Hoffman
Duen Horng Chau
240
3
0
30 Mar 2022
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
228
27
0
30 Mar 2022
On the Implicit Bias of Gradient Descent for Temporal Extrapolation
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Edo Cohen-Karlik
Avichai Ben David
Nadav Cohen
Amir Globerson
140
4
0
09 Feb 2022
Previous
1
2
3
4
5
Next