120

A Study of Neural Training with Non-Gradient and Noise Assisted Gradient Methods

Abstract

In this work we demonstrate provable guarantees on the training of depth-2 neural networks in new regimes than previously explored. (1) We start with exhibiting a non-gradient iterative algorithm "Neuro-Tron" which gives a first-of-its-kind poly-time approximate solving of a neural regression (here in the \ell_\infty-norm) problem at finite net widths and with non-realizable data. (2) Next we give a simple stochastic algorithm that can train a ReLU gate in the realizable setting with significantly milder conditions on the data distribution than previous results. Leveraging some additional distributional assumptions we also show near-optimal guarantees of training a ReLU gate when an adversary is allowed to corrupt the true labels. (3) Lastly we analyze the behaviour of noise assisted gradient descent on a ReLU gate in the realizable setting. While making no further distributional assumptions, we locate a ball centered at the origin such that all the iterates remain inside it with high probability.

View on arXiv
Comments on this paper