145

Outlier Robust Multivariate Polynomial Regression

Embedded Systems and Applications (ESA), 2024
Main:29 Pages
3 Figures
Bibliography:2 Pages
1 Tables
Appendix:6 Pages
Abstract

We study the problem of robust multivariate polynomial regression: let p ⁣:RnRp\colon\mathbb{R}^n\to\mathbb{R} be an unknown nn-variate polynomial of degree at most dd in each variable. We are given as input a set of random samples (xi,yi)[1,1]n×R(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R} that are noisy versions of (xi,p(xi))(\mathbf{x}_i,p(\mathbf{x}_i)). More precisely, each xi\mathbf{x}_i is sampled independently from some distribution χ\chi on [1,1]n[-1,1]^n, and for each ii independently, yiy_i is arbitrary (i.e., an outlier) with probability at most ρ<1/2\rho < 1/2, and otherwise satisfies yip(xi)σ|y_i-p(\mathbf{x}_i)|\leq\sigma. The goal is to output a polynomial p^\hat{p}, of degree at most dd in each variable, within an \ell_\infty-distance of at most O(σ)O(\sigma) from pp. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n=1n=1. We generalize their results to the nn-variate setting, showing an algorithm that achieves a sample complexity of On(dnlogd)O_n(d^n\log d), where the hidden constant depends on nn, if χ\chi is the nn-dimensional Chebyshev distribution. The sample complexity is On(d2nlogd)O_n(d^{2n}\log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ)O(\sigma), and the run-time depends on log(1/σ)\log(1/\sigma). In the setting where each xi\mathbf{x}_i and yiy_i are known up to NN bits of precision, the run-time's dependence on NN is linear. We also show that our sample complexities are optimal in terms of dnd^n. Furthermore, we show that it is possible to have the run-time be independent of 1/σ1/\sigma, at the cost of a higher sample complexity.

View on arXiv
Comments on this paper