Optimal Spectral Recovery of a Planted Vector in a Subspace

31 May 2021

Abstract

Recovering a planted vector $v$ in an $n$ -dimensional random subspace of $\mathbb{R}^N$ is a generic task related to many problems in machine learning and statistics, such as dictionary learning, subspace recovery, and principal component analysis. In this work, we study computationally efficient estimation and detection of a planted vector $v$ whose $\ell_4$ norm differs from that of a Gaussian vector with the same $\ell_2$ norm. For instance, in the special case of an $N \rho$ -sparse vector $v$ with Rademacher nonzero entries, our results include the following: (1) We give an improved analysis of (a slight variant of) the spectral method proposed by Hopkins, Schramm, Shi, and Steurer, showing that it approximately recovers $v$ with high probability in the regime $n \rho \ll \sqrt{N}$ . In contrast, previous work required either $\rho \ll 1/\sqrt{n}$ or $n \sqrt{\rho} \lesssim \sqrt{N}$ for polynomial-time recovery. Our result subsumes both of these conditions (up to logarithmic factors) and also treats the dense case $\rho = 1$ which was not previously considered. (2) Akin to $\ell_\infty$ bounds for eigenvector perturbation, we establish an entrywise error bound for the spectral estimator via a leave-one-out analysis, from which it follows that thresholding recovers $v$ exactly. (3) We study the associated detection problem and show that in the regime $n \rho \gg \sqrt{N}$ , any spectral method from a large class (and more generally, any low-degree polynomial of the input) fails to detect the planted vector. This establishes optimality of our upper bounds and offers evidence that no polynomial-time algorithm can succeed when $n \rho \gg \sqrt{N}$ .

View on arXiv

Comments on this paper