We apply concepts from manifold regularization to develop new regularization techniques for training locally stable deep neural networks. Our regularizers are based on a sparsification of the graph Laplacian which holds with high probability when the data is sparse in high dimensions, as is common in deep learning. Empirically, our networks exhibit stability in a diverse set of perturbation models, including , , and Wasserstein-based perturbations; in particular, we achieve 40% adversarial accuracy on CIFAR-10 against an adaptive PGD attack using perturbations of size , and state-of-the-art verified accuracy of 21% in the same perturbation model. Furthermore, our techniques are efficient, incurring overhead on par with two additional parallel forward passes through the network.
View on arXiv