15
0

The Permuted Striped Block Model and its Factorization -- Algorithms with Recovery Guarantees

Abstract

We introduce a novel class of matrices which are defined by the factorization Y:=AX\textbf{Y} :=\textbf{A}\textbf{X}, where A\textbf{A} is an m×nm \times n wide sparse binary matrix with a fixed number dd nonzeros per column and X\textbf{X} is an n×Nn \times N sparse real matrix whose columns have at most kk nonzeros and are dissociated\textit{dissociated}. Matrices defined by this factorization can be expressed as a sum of nn rank one sparse matrices, whose nonzero entries, under the appropriate permutations, form striped blocks - we therefore refer to them as Permuted Striped Block (PSB) matrices. We define the PSB data model\textit{PSB data model} as a particular distribution over this class of matrices, motivated by its implications for community detection, provable binary dictionary learning with real valued sparse coding, and blind combinatorial compressed sensing. For data matrices drawn from the PSB data model, we provide computationally efficient factorization algorithms which recover the generating factors with high probability from as few as N=O(nklog2(n))N =O\left(\frac{n}{k}\log^2(n)\right) data vectors, where kk, mm and nn scale proportionally. Notably, these algorithms achieve optimal sample complexity up to logarithmic factors.

View on arXiv
Comments on this paper