Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

23 September 2018

Abstract

In this note, we study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$ ), which arise in the context of training depth- $k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$ , the number of iterations required for convergence scales exponentially with the depth $k$ . This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

View on arXiv

Comments on this paper