10
31

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Abstract

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given O~(dr2)\tilde{O}(dr^{2}) random linear measurements of a rank rr positive semidefinite matrix XX^{\star}, we can recover XX^{\star} by parameterizing it by UUUU^\top with URd×dU\in \mathbb R^{d\times d} and minimizing the squared loss, even if rdr \ll d. We prove that starting from a small initialization, gradient descent recovers XX^{\star} in O~(r)\tilde{O}(\sqrt{r}) iterations approximately. The results solve the conjecture of Gunasekar et al.'17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

View on arXiv
Comments on this paper