Distributional Smoothing with Virtual Adversarial Training
The generalization performance of model-based prediction depends largely on the family from which we infer the model, and the majority of successful families of models possess some property of smoothness because most real world phenomena share similar smoothness properties. We would like to propose Local Distributional Smoothness (LDS), a new notion of smoothness for statistical model which respects our intuitive notion of "smooth distribution". The LDS of a model at a point is defined as the KL-divergence based robustness of the model distribution against local perturbation. Following the work of Adversarial training, we named the LDS based regularization as Virtual Adversarial training (VAT). VAT resembles adversarial training, but distinguishes itself in that it determines the adversarial direction from the model distribution alone without using the label information. The technique is therefore applicable even to semi-supervised learning. When we applied our technique to the classification task of the permutation invariant MNIST dataset, it eclipsed all the training methods that are free of generative models and pre-training. VAT also performed well even in comparison to state of the art method that uses a highly advanced generative model.
View on arXiv