Stochastic Low-Rank Subspace Clustering by Auxiliary Variable Modeling

28 March 2015

Abstract

Low-Rank Representation (LRR) has been a popular tool for identifying data generated from a union of subspaces. It is also known that LRR is computationally challenging. As the size of the nuclear norm regularized matrix of LRR is proportional to $n^2$ (where $n$ is the number of samples), it seriously hinders LRR for large scale problems. In this paper, we develop a novel algorithm to scale up the LRR method accurately and memory efficiently. In particular, we propose an online implementation of LRR that reduces the memory cost from $O(n^2)$ to $O(pd)$ , with $p$ being the ambient dimension and $d$ being some estimated rank~( $d < p \ll n$ ). Our proposed algorithm consists of two key technical components: (i) we reformulate the nuclear norm to an equivalent matrix factorization form, and (ii) we introduce an auxiliary variable which serves as a basis dictionary of the underlying data. Combing these two techniques makes the problem amenable to stochastic optimization. We establish the theoretical guarantee that the sequence of solutions produced by our algorithm converge to a stationary point of the expected loss function asymptotically. Extensive experiments on synthetic and realistic datasets further substantiate that our algorithm is fast, robust and memory efficient.

View on arXiv

Comments on this paper