287

Critical Points Of An Autoencoder Can Provably Recover Sparsely Used Overcomplete Dictionaries

International Symposium on Information Theory (ISIT), 2017
Abstract

In "Dictionary Learning" one is trying to recover incoherent matrices ARn×hA^* \in \mathbb{R}^{n \times h} (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors xRhx^* \in \mathbb{R}^h with a small support of size hph^p for some 0<p<10 <p < 1 while being given access to observations yRny \in \mathbb{R}^n where y=Axy = A^*x^*. In this work we undertake a rigorous analysis of the possibility that dictionary learning could be performed by gradient descent on "Autoencoders", which are RnRn\mathbb{R}^n \rightarrow \mathbb{R}^n neural network with a single ReLU activation layer of size hh. Towards the above objective we propose a new autoencoder loss function which modifies the squared loss error term and also adds new regularization terms. We create a proxy for the expected gradient of this loss function which we motivate with high probability arguments, under natural distributional assumptions on the sparse code xx^*. Under the same distributional assumptions on xx^*, we show that, in the limit of large enough sparse code dimension, any zero point of our proxy for the expected gradient of the loss function within a certain radius of AA^* corresponds to dictionaries whose action on the sparse vectors is indistinguishable from that of AA^*. We also report simulations on synthetic data in support of our theory.

View on arXiv
Comments on this paper