Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

23 September 2018

Papers citing "Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks"

14 / 14 papers shown

Title
Magnitude and Angle Dynamics in Training Single ReLU Neurons Sangmin Lee Byeongsu Sim Jong Chul Ye MLT 94 6 0 27 Sep 2022
Explicitising The Implicit Intrepretability of Deep Neural Networks Via Duality Chandrashekar Lakshminarayanan Ashutosh Kumar Singh A. Rajkumar AI4CE 13 1 0 01 Mar 2022
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions Martin Hutzenthaler Arnulf Jentzen Katharina Pohl Adrian Riekert Luca Scarpa MLT 34 6 0 13 Dec 2021
The loss landscape of deep linear neural networks: a second-order analysis E. M. Achour Franccois Malgouyres Sébastien Gerchinovitz ODL 22 9 0 28 Jul 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent Spencer Frei Quanquan Gu 23 25 0 25 Jun 2021
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions Arnulf Jentzen Adrian Riekert MLT 32 13 0 01 Apr 2021
Deep matrix factorizations Pierre De Handschutter Nicolas Gillis Xavier Siebert BDL 28 40 0 01 Oct 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy E. Moroshko Suriya Gunasekar Blake E. Woodworth J. Lee Nathan Srebro Daniel Soudry 27 85 0 13 Jul 2020
Non-convergence of stochastic gradient descent in the training of deep neural networks Patrick Cheridito Arnulf Jentzen Florian Rossmannek 14 37 0 12 Jun 2020
Global Convergence of Gradient Descent for Deep Linear Residual Networks Lei Wu Qingcan Wang Chao Ma ODL AI4CE 20 22 0 02 Nov 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks S. Du Wei Hu 16 93 0 24 Jan 2019
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks Sanjeev Arora Nadav Cohen Noah Golowich Wei Hu 13 280 0 04 Oct 2018
Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks Peter L. Bartlett D. Helmbold Philip M. Long 23 116 0 16 Feb 2018
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 136 1,198 0 16 Aug 2016