232

Dropout as data augmentation

Abstract

Dropout is typically interpreted as bagging a large number of models sharing parameters. We show that using dropout in a network can also be interpreted as a kind of data augmentation in the input space without domain knowledge. We present an approach to projecting the dropout noise within a network back into the input space, thereby generating augmented versions of the training data, and we show that training a deterministic network on the augmented samples yields similar results. Our results shed a new light on the important properties of noise in neural networks and suggests, for instance, that the avoidance of co-adaptation of neurons has no significant effect on performance of the neural network. Finally, we propose a new dropout noise scheme based on our observations and show that it improves dropout results without adding significant computational cost.

View on arXiv
Comments on this paper