56
20

Optimal Spectral Recovery of a Planted Vector in a Subspace

Abstract

Recovering a planted vector vv in an nn-dimensional random subspace of RN\mathbb{R}^N is a generic task related to many problems in machine learning and statistics, such as dictionary learning, subspace recovery, and principal component analysis. In this work, we study computationally efficient estimation and detection of a planted vector vv whose 4\ell_4 norm differs from that of a Gaussian vector with the same 2\ell_2 norm. For instance, in the special case of an NρN \rho-sparse vector vv with Rademacher nonzero entries, our results include the following: (1) We give an improved analysis of (a slight variant of) the spectral method proposed by Hopkins, Schramm, Shi, and Steurer, showing that it approximately recovers vv with high probability in the regime nρNn \rho \ll \sqrt{N}. In contrast, previous work required either ρ1/n\rho \ll 1/\sqrt{n} or nρNn \sqrt{\rho} \lesssim \sqrt{N} for polynomial-time recovery. Our result subsumes both of these conditions (up to logarithmic factors) and also treats the dense case ρ=1\rho = 1 which was not previously considered. (2) Akin to \ell_\infty bounds for eigenvector perturbation, we establish an entrywise error bound for the spectral estimator via a leave-one-out analysis, from which it follows that thresholding recovers vv exactly. (3) We study the associated detection problem and show that in the regime nρNn \rho \gg \sqrt{N}, any spectral method from a large class (and more generally, any low-degree polynomial of the input) fails to detect the planted vector. This establishes optimality of our upper bounds and offers evidence that no polynomial-time algorithm can succeed when nρNn \rho \gg \sqrt{N}.

View on arXiv
Comments on this paper