Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient
Descent for Łojasiewicz Functions
We prove \emph{almost sure convergence rates} of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for \L ojasiewicz functions. The SZGD algorithm iterates as \begin{align*} x_{t+1} = x_t - \eta_t \widehat{\nabla} f (x_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent , is the step size (learning rate), and $ \widehat{\nabla} f (x_t) $ is the approximate gradient estimated using zeroth-order information. We show that, for {smooth} \L ojasiewicz functions, the sequence generated by SZGD converges to a bounded point almost surely, and is a critical point of . If , $ f (x_t) - f (x_\infty) $, $ \sum_{s=t}^\infty \| x_{s+1} - x_{s} \|^2$ and $ \| x_t - x_\infty \| $ ( is the Euclidean norm) converge to zero \emph{linearly almost surely}. If , then $ f (x_t) - f (x_\infty) $ (and $ \sum_{s=t}^\infty \| x_{s+1} - x_s \|^2 $) converges to zero at rate $O \left( t^{\frac{1}{1 - 2\theta}} \right) $ almost surely; $ \| x_{t} - x_\infty \| $ converges to zero at rate $O \left( t^{\frac{1-\theta}{1-2\theta}} \right) $ almost surely. To the best of our knowledge, this paper provides the first \emph{almost sure convergence rate} guarantee for stochastic zeroth order algorithms for \L ojasiewicz functions.
View on arXiv