Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed , we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by , where is the Lipschitz constant of the gradient and is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by , where is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by , where is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Moreover, for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However, gradient norm minimization equips us with a convenient tool to monitor the progress of algorithms and thus the ability to estimate such parameters in-situ.
View on arXiv