79
15

Path Length Bounds for Gradient Descent and Flow

Abstract

We derive bounds on the path length ζ\zeta of gradient descent (GD) and gradient flow (GF) curves for various classes of smooth convex and nonconvex functions. Among other results, we prove that: (a) if the iterates are linearly convergent with factor (1c)(1-c), then ζ\zeta is at most O(1/c)\mathcal{O}(1/c); (b) under the Polyak-Kurdyka-Lojasiewicz (PKL) condition, ζ\zeta is at most O(κ)\mathcal{O}(\sqrt{\kappa}), where κ\kappa is the condition number, and at least Ω~(dκ1/4)\widetilde\Omega(\sqrt{d} \wedge \kappa^{1/4}); (c) for quadratics, ζ\zeta is Θ(min{d,logκ})\Theta(\min\{\sqrt{d},\sqrt{\log \kappa}\}) and in some cases can be independent of κ\kappa; (d) assuming just convexity, ζ\zeta can be at most 24dlogd2^{4d\log d}; (e) for separable quasiconvex functions, ζ\zeta is Θ(d){\Theta}(\sqrt{d}). Thus, we advance current understanding of the properties of GD and GF curves beyond rates of convergence. We expect our techniques to facilitate future studies for other algorithms.

View on arXiv
Comments on this paper