290
v1v2v3v4v5 (latest)

MGDA Converges under Generalized Smoothness, Provably

Main:10 Pages
2 Figures
Bibliography:3 Pages
7 Tables
Appendix:19 Pages
Abstract

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard LL-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized \ell-smooth loss functions, where \ell is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized \ell-smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an ϵ\epsilon-accurate Pareto stationary point with a guaranteed ϵ\epsilon-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally O(ϵ2)\mathcal{O}(\epsilon^{-2}) and O(ϵ4)\mathcal{O}(\epsilon^{-4}) samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter ϵ\epsilon-level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only O(1)\mathcal{O}(1) time and space, while achieving the same performance guarantee as MGDA.

View on arXiv
Comments on this paper