Convergence of gradient descent for learning linear neural networks

Convergence of gradient descent for learning linear neural networks

4 August 2021

Gabin Maxime Nguegnang

Ulrich Terstiege

Papers citing "Convergence of gradient descent for learning linear neural networks"

14 / 14 papers shown

Title
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 34 1 0 15 Jan 2025
Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks Gabor Lugosi Eulalia Nualart 13 0 0 11 Sep 2024
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning Nadav Cohen Noam Razin 31 0 0 25 Aug 2024
How do Transformers perform In-Context Autoregressive Learning? Michael E. Sander Raja Giryes Taiji Suzuki Mathieu Blondel Gabriel Peyré 32 7 0 08 Feb 2024
Geometry of Linear Neural Networks: Equivariance and Invariance under Permutation Groups Kathlén Kohn Anna-Laura Sattelberger V. Shahverdi 25 3 0 24 Sep 2023
Asymmetric matrix sensing by gradient descent with small random initialization J. S. Wind 38 1 0 04 Sep 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 28 7 0 09 May 2023
Function Space and Critical Points of Linear Convolutional Networks Kathlén Kohn Guido Montúfar V. Shahverdi Matthew Trager 16 11 0 12 Apr 2023
Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss Pierre Bréchet Katerina Papagiannouli Jing An Guido Montúfar 23 3 0 06 Mar 2023
Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space Juncai He R. Tsai Rachel A. Ward 36 8 0 01 Mar 2022
Continuous vs. Discrete Optimization of Deep Neural Networks Omer Elkabetz Nadav Cohen 62 44 0 14 Jul 2021
Global optimality conditions for deep neural networks Chulhee Yun S. Sra Ali Jadbabaie 121 117 0 08 Jul 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,888 0 15 Sep 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 133 1,198 0 16 Aug 2016