11
7

Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

Abstract

We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed ε\varepsilon, we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by O(1)Lx0x/εO(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}, where LL is the Lipschitz constant of the gradient and xx^* is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by O(1)L/μlog(f(x0)/ϵ)O(1)\sqrt{L/\mu}\log(\|\nabla f(x_0)\|/\epsilon), where μ\mu is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by O(1)Ll(f(x0)f(x))/ε2O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2, where ll is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Moreover, for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However, gradient norm minimization equips us with a convenient tool to monitor the progress of algorithms and thus the ability to estimate such parameters in-situ.

View on arXiv
Comments on this paper