Convergence Rates of Stochastic Zeroth-order Gradient Descent for Ł ojasiewicz Functions

Abstract
We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - \eta_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent , is the step size (learning rate), and is the approximate gradient estimated using zeroth-order information only. Our results show that can converge faster than , regardless of whether the objective is smooth or nonsmooth.
View on arXivComments on this paper