80

Escaping Saddle Points via Curvature-Calibrated Perturbations: A Complete Analysis with Explicit Constants and Empirical Validation

Main:10 Pages
3 Figures
Bibliography:2 Pages
4 Tables
Appendix:4 Pages
Abstract

We present a comprehensive theoretical analysis of first-order methods for escaping strict saddle points in smooth non-convex optimization. Our main contribution is a Perturbed Saddle-escape Descent (PSD) algorithm with fully explicit constants and a rigorous separation between gradient-descent and saddle-escape phases. For a function f:RdRf:\mathbb{R}^d\to\mathbb{R} with \ell-Lipschitz gradient and ρ\rho-Lipschitz Hessian, we prove that PSD finds an (ϵ,ρϵ)(\epsilon,\sqrt{\rho\epsilon})-approximate second-order stationary point with high probability using at most O(Δf/ϵ2)O(\ell\Delta_f/\epsilon^2) gradient evaluations for the descent phase plus O((/ρϵ)log(d/δ))O((\ell/\sqrt{\rho\epsilon})\log(d/\delta)) evaluations per escape episode, with at most O(Δf/ϵ2)O(\ell\Delta_f/\epsilon^2) episodes needed. We validate our theoretical predictions through extensive experiments across both synthetic functions and practical machine learning tasks, confirming the logarithmic dimension dependence and the predicted per-episode function decrease. We also provide complete algorithmic specifications including a finite-difference variant (PSD-Probe) and a stochastic extension (PSGD) with robust mini-batch sizing.

View on arXiv
Comments on this paper