We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric outliers. We study the recovery of the global l0 subspace (i.e., with largest number of points) by minimizing the lp-averaged distances of data points from d-dimensional subspaces of R^D, where p>0. Unlike other lp minimization problems, this minimization is non-convex for all p>0 and thus requires different methods for its analysis. We show that if 0<p<=1, then the global l0 subspace can be recovered by lp minimization with overwhelming probability (which depends on the generating distribution and its parameters). Moreover, when adding homoscedastic noise around the underlying subspaces, then with overwhelming probability the generalized l0 subspace (with largest number of points "around it") can be nearly recovered by lp minimization with an error proportional to the noise level. On the other hand, if p>1 and there is more than one underlying subspace, then with overwhelming probability the global l0 subspace cannot be recovered and the generalized one cannot even be nearly recovered.
View on arXiv