ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.02281
  4. Cited By
A Convergence Analysis of Gradient Descent for Deep Linear Neural
  Networks
v1v2v3 (latest)

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

4 October 2018
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
ArXiv (abs)PDFHTML

Papers citing "A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks"

50 / 209 papers shown
Title
Analysis of the rate of convergence of an over-parametrized
  convolutional neural network image classifier learned by gradient descent
Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descentJournal of Statistical Planning and Inference (JSPI), 2024
Michael Kohler
A. Krzyżak
Benjamin Walter
195
1
0
13 May 2024
Masks, Signs, And Learning Rate Rewinding
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
204
14
0
29 Feb 2024
Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion
Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion
Shuqi Ke
Charlie Hou
Giulia Fanti
Sewoong Oh
187
5
0
29 Feb 2024
Sobolev Training for Operator Learning
Sobolev Training for Operator Learning
Namkyeong Cho
Junseung Ryu
Hyung Ju Hwang
95
1
0
14 Feb 2024
Estimating the Local Learning Coefficient at Scale
Estimating the Local Learning Coefficient at Scale
Zach Furman
Edmund Lau
162
4
0
06 Feb 2024
Linear Recursive Feature Machines provably recover low-rank matrices
Linear Recursive Feature Machines provably recover low-rank matricesProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Adityanarayanan Radhakrishnan
Misha Belkin
Dmitriy Drusvyatskiy
244
12
0
09 Jan 2024
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Understanding Unimodal Bias in Multimodal Deep Linear NetworksInternational Conference on Machine Learning (ICML), 2023
Yedi Zhang
Peter E. Latham
Andrew Saxe
220
11
0
01 Dec 2023
Convergence Analysis for Learning Orthonormal Deep Linear Neural
  Networks
Convergence Analysis for Learning Orthonormal Deep Linear Neural NetworksIEEE Signal Processing Letters (IEEE SPL), 2023
Zhen Qin
Xuwei Tan
Zhihui Zhu
275
1
0
24 Nov 2023
Analysis of the expected $L_2$ error of an over-parametrized deep neural
  network estimate learned by gradient descent without regularization
Analysis of the expected L2L_2L2​ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization
Selina Drews
Michael Kohler
153
4
0
24 Nov 2023
Fast-ELECTRA for Efficient Pre-training
Fast-ELECTRA for Efficient Pre-trainingInternational Conference on Learning Representations (ICLR), 2023
Chengyu Dong
Liyuan Liu
Hao Cheng
Jingbo Shang
Jianfeng Gao
Xiaodong Liu
188
2
0
11 Oct 2023
Are GATs Out of Balance?
Are GATs Out of Balance?Neural Information Processing Systems (NeurIPS), 2023
Nimrah Mustafa
Aleksandar Bojchevski
R. Burkholz
294
8
0
11 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of
  Superposition
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
118
21
0
10 Oct 2023
Towards Training Without Depth Limits: Batch Normalization Without
  Gradient Explosion
Towards Training Without Depth Limits: Batch Normalization Without Gradient ExplosionInternational Conference on Learning Representations (ICLR), 2023
Alexandru Meterez
Amir Joudaki
Francesco Orabona
Alexander Immer
Gunnar Rätsch
Hadi Daneshmand
173
8
0
03 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and AttentionInternational Conference on Learning Representations (ICLR), 2023
Yuandong Tian
Yiping Wang
Zhenyu Zhang
Beidi Chen
Simon Shaolei Du
266
45
0
01 Oct 2023
Early Neuron Alignment in Two-layer ReLU Networks with Small
  Initialization
Early Neuron Alignment in Two-layer ReLU Networks with Small InitializationInternational Conference on Learning Representations (ICLR), 2023
Hancheng Min
Enrique Mallada
René Vidal
MLT
218
26
0
24 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient
  Flows
Abide by the Law and Follow the Flow: Conservation Laws for Gradient FlowsNeural Information Processing Systems (NeurIPS), 2023
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
289
27
0
30 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from
  Expansion and Shrinkage
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Yu Gui
Cong Ma
Yiqiao Zhong
165
8
0
06 Jun 2023
The Law of Parsimony in Gradient Descent for Learning Deep Linear
  Networks
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks
Can Yaras
Peng Wang
Wei Hu
Zhihui Zhu
Laura Balzano
Qing Qu
253
19
0
01 Jun 2023
Pruning at Initialization -- A Sketching Perspective
Pruning at Initialization -- A Sketching PerspectiveIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Noga Bar
Raja Giryes
226
1
0
27 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer TransformerNeural Information Processing Systems (NeurIPS), 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
368
96
0
25 May 2023
Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps
  from high to low rank
Implicit bias of SGD in L2L_{2}L2​-regularized linear DNNs: One-way jumps from high to low rankInternational Conference on Learning Representations (ICLR), 2023
Zihan Wang
Arthur Jacot
216
23
0
25 May 2023
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow
  Solutions in Scalar Networks and Beyond
Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and BeyondInternational Conference on Machine Learning (ICML), 2023
Itai Kreisler
Mor Shpigel Nacson
Daniel Soudry
Y. Carmon
183
16
0
22 May 2023
Critical Points and Convergence Analysis of Generative Deep Linear
  Networks Trained with Bures-Wasserstein Loss
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein LossInternational Conference on Machine Learning (ICML), 2023
Pierre Bréchet
Katerina Papagiannouli
Jing An
Guido Montúfar
338
7
0
06 Mar 2023
Bayesian Interpolation with Deep Linear Networks
Bayesian Interpolation with Deep Linear NetworksProceedings of the National Academy of Sciences of the United States of America (PNAS), 2022
Boris Hanin
Alexander Zlokapa
333
28
0
29 Dec 2022
Effects of Data Geometry in Early Deep Learning
Effects of Data Geometry in Early Deep LearningNeural Information Processing Systems (NeurIPS), 2022
Saket Tiwari
George Konidaris
297
7
0
29 Dec 2022
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
  Factorization
A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix Factorization
JIAN-PENG Cao
Chao Qian
Yihui Huang
Dicheng Chen
Yuncheng Gao
Jiyang Dong
D. Guo
X. Qu
301
1
0
29 Dec 2022
Asymptotic Analysis of Deep Residual Networks
Asymptotic Analysis of Deep Residual Networks
R. Cont
Alain Rossier
Renyuan Xu
137
4
0
15 Dec 2022
Infinite-width limit of deep linear neural networks
Infinite-width limit of deep linear neural networksCommunications on Pure and Applied Mathematics (CPAM), 2022
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
162
21
0
29 Nov 2022
Finite Sample Identification of Wide Shallow Neural Networks with Biases
Finite Sample Identification of Wide Shallow Neural Networks with Biases
M. Fornasier
T. Klock
Marco Mondelli
Michael Rauchensteiner
157
7
0
08 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flowInternational Conference on Learning Representations (ICLR), 2022
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
308
26
0
31 Oct 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural NetsInternational Conference on Learning Representations (ICLR), 2022
Edo Cohen-Karlik
Itamar Menuhin-Gruman
Raja Giryes
Nadav Cohen
Amir Globerson
299
7
0
25 Oct 2022
Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit
Deep Linear Networks for Matrix Completion -- An Infinite Depth LimitSIAM Journal on Applied Dynamical Systems (SIADS), 2022
Nadav Cohen
Govind Menon
Zsolt Veraszto
ODL
155
11
0
22 Oct 2022
TiDAL: Learning Training Dynamics for Active Learning
TiDAL: Learning Training Dynamics for Active LearningIEEE International Conference on Computer Vision (ICCV), 2022
Seong Min Kye
Kwanghee Choi
Hyeongmin Byun
Buru Chang
335
20
0
13 Oct 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisationNeural Information Processing Systems (NeurIPS), 2022
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
337
4
0
10 Oct 2022
Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis
  Function Decomposition
Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function DecompositionInternational Conference on Learning Representations (ICLR), 2022
Jianhao Ma
Li-Zhen Guo
Salar Fattahi
308
4
0
01 Oct 2022
Implicit Full Waveform Inversion with Deep Neural Representation
Implicit Full Waveform Inversion with Deep Neural Representation
Jian Sun
K. Innanen
AI4CE
155
55
0
08 Sep 2022
Intersection of Parallels as an Early Stopping Criterion
Intersection of Parallels as an Early Stopping CriterionInternational Conference on Information and Knowledge Management (CIKM), 2022
Ali Vardasbi
Maarten de Rijke
Mostafa Dehghani
MoMe
123
7
0
19 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
The Neural Race Reduction: Dynamics of Abstraction in Gated NetworksInternational Conference on Machine Learning (ICML), 2022
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
166
43
0
21 Jul 2022
A note on Linear Bottleneck networks and their Transition to
  Multilinearity
A note on Linear Bottleneck networks and their Transition to Multilinearity
Libin Zhu
Parthe Pandit
M. Belkin
MLT
151
0
0
30 Jun 2022
Analysis of Branch Specialization and its Application in Image
  Decomposition
Analysis of Branch Specialization and its Application in Image Decomposition
Jonathan Brokman
Guy Gilboa
98
2
0
12 Jun 2022
Explicit Regularization in Overparametrized Models via Noise Injection
Explicit Regularization in Overparametrized Models via Noise InjectionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Antonio Orvieto
Anant Raj
Hans Kersting
Francis R. Bach
124
33
0
09 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
331
101
0
08 Jun 2022
Understanding the Role of Nonlinearity in Training Dynamics of
  Contrastive Learning
Understanding the Role of Nonlinearity in Training Dynamics of Contrastive LearningInternational Conference on Learning Representations (ICLR), 2022
Yuandong Tian
MLT
256
17
0
02 Jun 2022
Blind Estimation of a Doubly Selective OFDM Channel: A Deep Learning
  Algorithm and Theory
Blind Estimation of a Doubly Selective OFDM Channel: A Deep Learning Algorithm and Theory
T. Getu
N. Golmie
D. Griffith
156
2
0
30 May 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide
  Neural Networks
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
Blake Bordelon
Cengiz Pehlevan
MLT
263
108
0
19 May 2022
A Convergence Analysis of Nesterov's Accelerated Gradient Method in
  Training Deep Linear Neural Networks
A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural NetworksInformation Sciences (Inf. Sci.), 2022
Xin Liu
Wei Tao
Zhisong Pan
80
11
0
18 Apr 2022
Convergence and Implicit Regularization Properties of Gradient Descent
  for Deep Residual Networks
Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual NetworksSocial Science Research Network (SSRN), 2022
R. Cont
Alain Rossier
Renyuan Xu
MLT
285
6
0
14 Apr 2022
Concept Evolution in Deep Learning Training: A Unified Interpretation
  Framework and Discoveries
Concept Evolution in Deep Learning Training: A Unified Interpretation Framework and DiscoveriesInternational Conference on Information and Knowledge Management (CIKM), 2022
Haekyu Park
Seongmin Lee
Benjamin Hoover
Austin P. Wright
Omar Shaikh
Rahul Duggal
Nilaksh Das
Kevin Wenliang Li
Judy Hoffman
Duen Horng Chau
240
3
0
30 Mar 2022
Convergence of gradient descent for deep neural networks
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
228
27
0
30 Mar 2022
On the Implicit Bias of Gradient Descent for Temporal Extrapolation
On the Implicit Bias of Gradient Descent for Temporal ExtrapolationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Edo Cohen-Karlik
Avichai Ben David
Nadav Cohen
Amir Globerson
140
4
0
09 Feb 2022
Previous
12345
Next