50
7

Identifiability of Low-Rank Sparse Component Analysis

Abstract

Sparse component analysis (SCA) is the following problem: Given an input matrix MM and an integer rr, find a dictionary DD with rr columns and a sparse matrix BB with rr rows such that MDBM \approx DB. A key issue in SCA is identifiability, that is, characterizing the conditions under which DD and BB are essentially unique (that is, they are unique up to permutation and scaling of the columns of DD and rows of BB). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of data points (that is, columns of MM) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when DD has rank rr, drastically improving with respect to previous results. In particular, we show that if each column of BB contains at least ss zeros then O(r3/s2)\mathcal{O}(r^3/s^2) data points are sufficient to obtain an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by r1r-1 columns of DD. This implies for example that for a fixed proportion of zeros (constant and independent of rr, e.g., 10\% of zero entries in BB), one only requires O(r)O(r) data points to guarantee identifiability.

View on arXiv
Comments on this paper