Total Least Squares Regression in Input Sparsity Time

In the total least squares problem, one is given an matrix , and an matrix , and one seeks to "correct" both and , obtaining matrices and , so that there exists an satisfying the equation . Typically the problem is overconstrained, meaning that . The cost of the solution is given by . We give an algorithm for finding a solution to the linear system for which the cost is at most a multiplicative factor times the optimal cost, up to an additive error that may be an arbitrarily small function of . Importantly, our running time is , where for a matrix , denotes its number of non-zero entries. Importantly, our running time does not directly depend on the large parameter . As total least squares regression is known to be solvable via low rank approximation, a natural approach is to invoke fast algorithms for approximate low rank approximation, obtaining matrices and from this low rank approximation, and then solving for so that . However, existing algorithms do not apply since in total least squares the rank of the low rank approximation needs to be , and so the running time of known methods would be at least . In contrast, we are able to achieve a much faster running time for finding by never explicitly forming the equation , but instead solving for an which is a solution to an implicit such equation. Finally, we generalize our algorithm to the total least squares problem with regularization.
View on arXiv