24
0

Efficient Continual Finite-Sum Minimization

Abstract

Given a sequence of functions f1,,fnf_1,\ldots,f_n with fi:DRf_i:\mathcal{D}\mapsto \mathbb{R}, finite-sum minimization seeks a point xD{x}^\star \in \mathcal{D} minimizing j=1nfj(x)/n\sum_{j=1}^n f_j(x)/n. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points x1,,xnD{x}_1^\star,\ldots,{x}_n^\star \in \mathcal{D} such that each xiD{x}^\star_i \in \mathcal{D} minimizes the prefix-sum j=1ifj(x)/i\sum_{j=1}^if_j(x)/i. Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method (CSVRG\mathrm{CSVRG}) producing an ϵ\epsilon-optimal sequence with O~(n/ϵ1/3+1/ϵ)\mathcal{\tilde{O}}(n/\epsilon^{1/3} + 1/\sqrt{\epsilon}) overall first-order oracles (FO). An FO corresponds to the computation of a single gradient fj(x)\nabla f_j(x) at a given xDx \in \mathcal{D} for some j[n]j \in [n]. Our approach significantly improves upon the O(n/ϵ)\mathcal{O}(n/\epsilon) FOs that StochasticGradientDescent\mathrm{StochasticGradientDescent} requires and the O(n2log(1/ϵ))\mathcal{O}(n^2 \log (1/\epsilon)) FOs that state-of-the-art variance reduction methods such as Katyusha\mathrm{Katyusha} require. We also prove that there is no natural first-order method with O(n/ϵα)\mathcal{O}\left(n/\epsilon^\alpha\right) gradient complexity for α<1/4\alpha < 1/4, establishing that the first-order complexity of our method is nearly tight.

View on arXiv
Comments on this paper