Layer-wise learning of deep generative models
- SyDaBDLDRLAI4CE
When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure which admits, under the right conditions, a global optimality guarantee. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting by showing that they train a lower bound of this criterion. We test the new learning procedure against the state of the art (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).
View on arXiv