115

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

Main:18 Pages
Bibliography:5 Pages
Appendix:13 Pages
Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution (x,y)(x, y) on Rd×R\mathbb{R}^d \times \mathbb{R} with xN(0,Id)x \sim \mathcal{N}(0,\mathbf{I}_d) and y=xβ+zy = x^\top \beta + z, where zz is drawn independently of xx from an unknown distribution EE. Moreover, zz satisfies PE[z=0]=α>0\mathbb{P}_E[z = 0] = \alpha>0. The goal is to accurately recover the regressor β\beta to small 2\ell_2-error. Ignoring computational considerations, this problem is known to be solvable using O(d/α)O(d/\alpha) samples. On the other hand, the best known polynomial-time algorithms require Ω(d/α2)\Omega(d/\alpha^2) samples. Here we provide formal evidence that the quadratic dependence in 1/α1/\alpha is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least Ω~(d1/2/α2)\tilde{\Omega}(d^{1/2}/\alpha^2).

View on arXiv
Comments on this paper