17
1

Convergence Rates of Stochastic Zeroth-order Gradient Descent for Ł ojasiewicz Functions

Abstract

We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - \eta_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where ff is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent θ\theta, ηt\eta_t is the step size (learning rate), and ^f(xt) \widehat{\nabla} f (\mathbf{x}_t) is the approximate gradient estimated using zeroth-order information only. Our results show that {f(xt)f(x)}tN \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } can converge faster than {xtx}tN \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }, regardless of whether the objective ff is smooth or nonsmooth.

View on arXiv
Comments on this paper