Byzantine Stochastic Gradient Descent

Abstract
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the machines which allegedly compute stochastic gradients every iteration, an -fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds -approximate minimizers of convex functions in iterations. In contrast, traditional mini-batch SGD needs iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.
View on arXivComments on this paper