40
7

Guarantees on learning depth-2 neural networks under a data-poisoning attack

Abstract

In this work, we study the possibility of defending against "data-poisoning" attacks while learning a neural net. We focus on the supervised learning setup for a class of finite-sized depth-2 nets - which include the standard single filter convolutional nets. For this setup we attempt to learn the true label generating weights in the presence of a malicious oracle doing stochastic bounded and additive adversarial distortions on the true labels being accessed by the algorithm during training. For the non-gradient stochastic algorithm that we instantiate we prove (worst case nearly optimal) trade-offs among the magnitude of the adversarial attack, the accuracy, and the confidence achieved by the proposed algorithm. Additionally, our algorithm uses mini-batching and we keep track of how the mini-batch size affects the convergence.

View on arXiv
Comments on this paper