259

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Abstract

We show that the (stochastic) gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given O~(dr2)\tilde{O}(dr^{2}) random linear measurements of a rank rr positive semidefinite matrix XX^{\star}, we can recover XX^{\star} by parameterizing it by UUUU^\top with URd×dU\in \mathbb{R}^{d\times d} and minimizing the squared loss, even if rdr \ll d. We prove that starting from a small initialization, gradient descent recovers XX^{\star} in O~(r)\tilde{O}(\sqrt{r}) iterations approximately. The results solve the conjecture of Gunasekar et al.'17 under the restricted isometry property. The technique can be applied to analyzing neural networks with quadratic activations with some technical modifications.

View on arXiv
Comments on this paper