38
35

Learning Halfspaces and Neural Networks with Random Initialization

Abstract

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are LL-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk ϵ>0\epsilon>0. The time complexity is polynomial in the input dimension dd and the sample size nn, but exponential in the quantity (L/ϵ2)log(L/ϵ)(L/\epsilon^2)\log(L/\epsilon). These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin γ>0\gamma>0, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin Ω(γ)\Omega(\gamma). As a consequence, the algorithm achieves arbitrary generalization error ϵ>0\epsilon>0 with poly(d,1/ϵ){\rm poly}(d,1/\epsilon) sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability η<1/2\eta<1/2.

View on arXiv
Comments on this paper