24
37

Recovery of Coherent Data via Low-Rank Dictionary Pursuit

Abstract

The recently established RPCA method provides us a convenient way to restore low-rank matrices from grossly corrupted observations. While elegant in theory and powerful in reality, RPCA may be not an ultimate solution to the low-rank matrix recovery problem. Indeed, its performance may not be perfect even when data are strictly low-rank. This is because conventional RPCA ignores the clustering structures of the data which are ubiquitous in modern applications. As the number of cluster grows, the coherence of data keeps increasing, and accordingly, the recovery performance of RPCA degrades. We show that the challenges raised by coherent data (i.e., the data with high coherence) could be alleviated by Low-Rank Representation (LRR), provided that the dictionary in LRR is configured appropriately. More precisely, we mathematically prove that if the dictionary itself is low-rank then LRR is immune to the coherence parameter which increases with the underlying cluster number. This provides an elementary principle for dealing with coherent data. Subsequently, we devise a practical algorithm to obtain proper dictionaries in unsupervised environments. Our extensive experiments on randomly generated matrices verify our claims.

View on arXiv
Comments on this paper