297

List-Decodable Linear Regression

Neural Information Processing Systems (NeurIPS), 2019
Abstract

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/21/2 fraction of examples. For any α<1\alpha < 1, our algorithm takes as input a sample {(xi,yi)}in\{ (x_i,y_i)\}_{i \leq n} of nn linear equations where αn\alpha n of the equations satisfy yi=xi,+ζy_i = \langle x_i,\ell^*\rangle +\zeta for some small noise ζ\zeta and (1α)n(1-\alpha)n of the equations are \emph{arbitrarily} chosen. It outputs a list LL of size O(1/α)O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell^*. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution DD. As a special case, this yields a (d/α)O(1/α8)(d/\alpha)^{O(1/\alpha^8)} time algorithm to find a O(1/α)O(1/\alpha) size list when the inlier distribution is a standard Gaussian. The anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm works for more general distributions under the additional assumption that \ell^* is Boolean valued. To solve the problem we introduce a new framework for list-decodable learning that strengthens the sum-of-squares `identifiability to algorithms' paradigm. In an independent work, Raghavendra and Yau [RY19] have obtained a similar result for list-decodable regression also using the sum-of-squares method.

View on arXiv
Comments on this paper