Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination

12 October 2025

Ilias Diakonikolas

Main:18 Pages

Bibliography:5 Pages

Appendix:13 Pages

Abstract

We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\ samples from a distribution $(x, y)$ on $\mathbb{R}^d \times \mathbb{R}$ with $x \sim \mathcal{N}(0,\mathbf{I}_d)$ and $y = x^\top \beta + z$ , where $z$ is drawn independently of $x$ from an unknown distribution $E$ . Moreover, $z$ satisfies $\mathbb{P}_E[z = 0] = \alpha>0$ . The goal is to accurately recover the regressor $\beta$ to small $\ell_2$ -error. Ignoring computational considerations, this problem is known to be solvable using $O(d/\alpha)$ samples. On the other hand, the best known polynomial-time algorithms require $\Omega(d/\alpha^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/\alpha$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\tilde{\Omega}(d^{1/2}/\alpha^2)$ .

View on arXiv

Comments on this paper