ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16573
47
3

Subspace approximation with outliers

30 June 2020
Amit Deshpande
Rameshwar Pratap
ArXiv (abs)PDFHTML
Abstract

The subspace approximation problem with outliers, for given nnn points in ddd dimensions x1,…,xn∈Rdx_{1},\ldots, x_{n} \in R^{d}x1​,…,xn​∈Rd, an integer 1≤k≤d1 \leq k \leq d1≤k≤d, and an outlier parameter 0≤α≤10 \leq \alpha \leq 10≤α≤1, is to find a kkk-dimensional linear subspace of RdR^{d}Rd that minimizes the sum of squared distances to its nearest (1−α)n(1-\alpha)n(1−α)n points. More generally, the ℓp\ell_{p}ℓp​ subspace approximation problem with outliers minimizes the sum of ppp-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the (1−α)n(1-\alpha)n(1−α)n inliers in the optimal solution are promised to lie exactly on a kkk-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal kkk-dimensional subspace summed over the optimal (1−α)n(1-\alpha)n(1−α)n inliers is at least δ\deltaδ times its squared-error summed over all nnn points, for some 0<δ≤1−α0 < \delta \leq 1 - \alpha0<δ≤1−α. With this assumption, we give an efficient algorithm to find a subset of poly(k/ϵ)log⁡(1/δ)log⁡log⁡(1/δ)poly(k/\epsilon) \log(1/\delta) \log\log(1/\delta)poly(k/ϵ)log(1/δ)loglog(1/δ) points whose span contains a kkk-dimensional subspace that gives a multiplicative (1+ϵ)(1+\epsilon)(1+ϵ)-approximation to the optimal solution. The running time of our algorithm is linear in nnn and ddd. Interestingly, our results hold even when the fraction of outliers α\alphaα is large, as long as the obvious condition 0<δ≤1−α0 < \delta \leq 1 - \alpha0<δ≤1−α is satisfied.

View on arXiv
Comments on this paper