Zero-bias autoencoders and the benefits of co-adapting features

International Conference on Learning Representations (ICLR), 2014

13 February 2014

Abstract

We show that training common regularized autoencoders resembles clustering, because it amounts to fitting a density model whose mass is concentrated in the directions of the individual weight vectors. We then propose a new activation function based on thresholding a linear function with zero bias (so it is truly linear not affine), and argue that this allows hidden units to "collaborate" in order to define larger regions of uniform density. We show that the new activation function makes it possible to train autoencoders without an explicit regularization penalty, such as sparsification, contraction or denoising, by simply minimizing reconstruction error. Experiments in a variety of recognition tasks show that zero-bias autoencoders perform about on par with common regularized autoencoders on low dimensional data and outperform these by an increasing margin as the dimensionality of the data increases.

View on arXiv

Comments on this paper