This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where is a positive semi-definite unknown matrix of rank , and one uses a symmetric parameterization to learn . Here with is the factor matrix. We give a novel lower bound of randomly initialized GD for the over-parameterized case () where is the number of iterations. This is in stark contrast to the exact-parameterization scenario () where the convergence rate is . Next, we study asymmetric setting where is the unknown matrix of rank , and one uses an asymmetric parameterization to learn where and . Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case () with an rate. Furthermore, we give the first global exact convergence result for the over-parameterization case () with an rate where is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of , recovering the rate in the exact-parameterization case.
View on arXiv