List-Decodable Linear Regression

Neural Information Processing Systems (NeurIPS), 2019

14 May 2019

Abstract

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$ , our algorithm takes as input a sample $\{ (x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $\alpha n$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +\zeta$ for some small noise $\zeta$ and $(1-\alpha)n$ of the equations are \emph{arbitrarily} chosen. It outputs a list $L$ of size $O(1/\alpha)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$ . Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$ . As a special case, this yields a $(d/\alpha)^{O(1/\alpha^8)}$ time algorithm to find a $O(1/\alpha)$ size list when the inlier distribution is a standard Gaussian. The anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm works for more general distributions under the additional assumption that $\ell^*$ is Boolean valued. To solve the problem we introduce a new framework for list-decodable learning that strengthens the sum-of-squares `identifiability to algorithms' paradigm. In an independent work, Raghavendra and Yau [RY19] have obtained a similar result for list-decodable regression also using the sum-of-squares method.

View on arXiv

Comments on this paper