-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation
Given an integer and a set of points in , the classic -PCA (Principle Component Analysis) approximates the affine \emph{-subspace mean} of , which is the -dimensional affine linear subspace that minimizes its sum of squared Euclidean distances (-norm) over the points of , i.e., the mean of these distances. The \emph{-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances (-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the (non-mixed) norms, it is non-convex for .We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in . More precisely, the multiplicative approximation factor is , and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as norm of distances for , e.g., , and handling outliers/sparsity.Open code and experimental results on real-world datasets are also provided.
View on arXiv