20
22

Finite-Sum Smooth Optimization with SARAH

Abstract

The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function F(w)=1ni=1nfi(w)F(w)=\frac{1}{n} \sum_{i=1}^n f_i(w) has been proven to be at least Ω(n/ϵ)\Omega(\sqrt{n}/\epsilon) for nO(ϵ2)n \leq \mathcal{O}(\epsilon^{-2}) where ϵ\epsilon denotes the attained accuracy E[F(w~)2]ϵ\mathbb{E}[ \|\nabla F(\tilde{w})\|^2] \leq \epsilon for the outputted approximation w~\tilde{w} (Fang et al., 2018). In this paper, we provide a convergence analysis for a slightly modified version of the SARAH algorithm (Nguyen et al., 2017a;b) and achieve total complexity that matches the lower-bound worst case complexity in (Fang et al., 2018) up to a constant factor when nO(ϵ2)n \leq \mathcal{O}(\epsilon^{-2}) for nonconvex problems. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.

View on arXiv
Comments on this paper