193

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Abstract

In this note, we study the dynamics of gradient descent on objective functions of the form f(i=1kwi)f(\prod_{i=1}^{k} w_i) (with respect to scalar parameters w1,,wkw_1,\ldots,w_k), which arise in the context of training depth-kk linear neural networks. We prove that for standard random initializations, and under mild assumptions on ff, the number of iterations required for convergence scales exponentially with the depth kk. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where kk is large.

View on arXiv
Comments on this paper