Outlier Robust Multivariate Polynomial Regression

Embedded Systems and Applications (ESA), 2024

14 March 2024

ArXiv (abs)PDF HTML Github

Main:29 Pages

3 Figures

Bibliography:2 Pages

1 Tables

Appendix:6 Pages

Abstract

We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$ -variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$ . More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]^n$ , and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$ , and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$ . The goal is to output a polynomial $\hat{p}$ , of degree at most $d$ in each variable, within an $\ell_\infty$ -distance of at most $O(\sigma)$ from $p$ . Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$ . We generalize their results to the $n$ -variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$ , where the hidden constant depends on $n$ , if $\chi$ is the $n$ -dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$ , if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$ , and the run-time depends on $\log(1/\sigma)$ . In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$ . Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$ , at the cost of a higher sample complexity.

View on arXiv

Comments on this paper