Sharpened Error Bound for Random Sampling Based $\ell_2$ Regression

30 March 2014

Abstract

Given a data matrix $X \in R^{n\times d}$ and a response vector $y \in R^{n}$ , suppose $n>d$ , it costs $O(n d^2)$ time and $O(n d)$ space to solve the least squares regression (LSR) problem. When $n$ and $d$ are both large, exactly solving the LSR problem is very expensive. When $n \gg d$ , one feasible approach to accelerating LSR is to randomly embed $y$ and all columns of $X$ into the subspace $R^c$ where $c\ll n$ ; the induced LSR problem has the same number of columns but much fewer number of rows, and the induced problem can be solved in $O(c d^2)$ time and $O(c d)$ space. The leverage scores based sampling is an effective subspace embedding method and can be applied to accelerate LSR. It was shown previously that $c = O(d \epsilon^{-2} \log d)$ is sufficient for achieving $1+\epsilon$ accuracy. In this paper we sharpen this error bound, showing that $c = O(d \log d + d \epsilon^{-1})$ is enough for $1+\epsilon$ accuracy.

View on arXiv

Comments on this paper

Sharpened Error Bound for Random Sampling Based ℓ2\ell_2ℓ2​ Regression

Sharpened Error Bound for Random Sampling Based $\ell_2$ Regression