SGD Learns the Conjugate Kernel Class of the Network

Neural Information Processing Systems (NeurIPS), 2017

27 February 2017

Abstract

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space, as defined in Daniely, Frostig and Singer (2016). The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth $\ge 3$ .

View on arXiv

Comments on this paper