253
v1v2v3v4v5v6 (latest)

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Abstract

In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solving the nonconvex-strongly-concave minimax problems by using the unified adaptive matrices, which include almost all existing coordinate-wise and global adaptive learning rates. In particular, we provide an effective convergence analysis framework for our adaptive GDA methods. Specifically, we propose a fast Adaptive Gradient Descent Ascent (AdaGDA) method based on the basic momentum technique, which reaches a lower gradient complexity of O~(κ4ϵ4)\tilde{O}(\kappa^4\epsilon^{-4}) for finding an ϵ\epsilon-stationary point without large batches, which improves the existing results of the adaptive GDA methods by a factor of O(κ)O(\sqrt{\kappa}). Moreover, we propose an accelerated version of AdaGDA (VR-AdaGDA) method based on the momentum-based variance reduced technique, which achieves a lower gradient complexity of O~(κ4.5ϵ3)\tilde{O}(\kappa^{4.5}\epsilon^{-3}) for finding an ϵ\epsilon-stationary point without large batches, which improves the existing results of the adaptive GDA methods by a factor of O(ϵ1)O(\epsilon^{-1}). Moreover, we prove that our VR-AdaGDA method can reach the best known gradient complexity of O~(κ3ϵ3)\tilde{O}(\kappa^{3}\epsilon^{-3}) with the mini-batch size O(κ3)O(\kappa^3). The experiments on policy evaluation and fair classifier learning tasks are conducted to verify the efficiency of our new algorithms.

View on arXiv
Comments on this paper