326

Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

Abstract

We prove \emph{almost sure convergence rates} of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for \L ojasiewicz functions. The SZGD algorithm iterates as \begin{align*} x_{t+1} = x_t - \eta_t \widehat{\nabla} f (x_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where ff is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent θ\theta, ηt\eta_t is the step size (learning rate), and $ \widehat{\nabla} f (x_t) $ is the approximate gradient estimated using zeroth-order information. We show that, for {smooth} \L ojasiewicz functions, the sequence {xt}tN\{ x_t \}_{t\in\mathbb{N}} generated by SZGD converges to a bounded point xx_\infty almost surely, and xx_\infty is a critical point of ff. If θ(0,12]\theta \in (0,\frac{1}{2}], $ f (x_t) - f (x_\infty) $, $ \sum_{s=t}^\infty \| x_{s+1} - x_{s} \|^2$ and $ \| x_t - x_\infty \| $ (\| \cdot \| is the Euclidean norm) converge to zero \emph{linearly almost surely}. If θ(12,1)\theta \in (\frac{1}{2}, 1), then $ f (x_t) - f (x_\infty) $ (and $ \sum_{s=t}^\infty \| x_{s+1} - x_s \|^2 $) converges to zero at rate $O \left( t^{\frac{1}{1 - 2\theta}} \right) $ almost surely; $ \| x_{t} - x_\infty \| $ converges to zero at rate $O \left( t^{\frac{1-\theta}{1-2\theta}} \right) $ almost surely. To the best of our knowledge, this paper provides the first \emph{almost sure convergence rate} guarantee for stochastic zeroth order algorithms for \L ojasiewicz functions.

View on arXiv
Comments on this paper