SGD Learns the Conjugate Kernel Class of the Network
Neural Information Processing Systems (NeurIPS), 2017
Abstract
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space, as defined in Daniely, Frostig and Singer (2016). The result holds for log-depth networks from a rich family of architectures. To the best of our knowledge, it is the first polynomial-time guarantee for the standard neural network learning algorithm for networks of depth .
View on arXivComments on this paper
