26
12

Distributed saddle point problems for strongly concave-convex functions

Abstract

In this paper, we propose GT-GDA, a distributed optimization method to solve saddle point problems of the form: minxmaxy{F(x,y):=G(x)+y,PxH(y)}\min_{\mathbf{x}} \max_{\mathbf{y}} \{F(\mathbf{x},\mathbf{y}) :=G(\mathbf{x}) + \langle \mathbf{y}, \overline{P} \mathbf{x} \rangle - H(\mathbf{y})\}, where the functions G()G(\cdot), H()H(\cdot), and the the coupling matrix P\overline{P} are distributed over a strongly connected network of nodes. GT-GDA is a first-order method that uses gradient tracking to eliminate the dissimilarity caused by heterogeneous data distribution among the nodes. In the most general form, GT-GDA includes a consensus over the local coupling matrices to achieve the optimal (unique) saddle point, however, at the expense of increased communication. To avoid this, we propose a more efficient variant GT-GDA-Lite that does not incur the additional communication and analyze its convergence in various scenarios. We show that GT-GDA converges linearly to the unique saddle point solution when G()G(\cdot) is smooth and convex, H()H(\cdot) is smooth and strongly convex, and the global coupling matrix P\overline{P} has full column rank. We further characterize the regime under which GT-GDA exhibits a network topology-independent convergence behavior. We next show the linear convergence of GT-GDA to an error around the unique saddle point, which goes to zero when the coupling cost y,Px{\langle \mathbf y, \overline{P} \mathbf x \rangle} is common to all nodes, or when G()G(\cdot) and H()H(\cdot) are quadratic. Numerical experiments illustrate the convergence properties and importance of GT-GDA and GT-GDA-Lite for several applications.

View on arXiv
Comments on this paper