379
v1v2v3v4v5 (latest)

A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks

Main:38 Pages
3 Figures
Bibliography:4 Pages
Abstract

In this work, we establish non-asymptotic convergence bounds for the Gauss-Newton method in training neural networks with smooth activations. In the underparameterized regime, the Gauss-Newton gradient flow in parameter space induces a Riemannian gradient flow on a low-dimensional embedded submanifold of the function space. Using tools from Riemannian optimization, we establish geodesic Polyak-Lojasiewicz and Lipschitz-smoothness conditions for the loss under appropriately chosen output scaling, yielding geometric convergence to the optimal in-class predictor at an explicit rate independent of the conditioning of the Gram matrix. In the overparameterized regime, we propose adaptive, curvature-aware regularization schedules that ensure fast geometric convergence to a global optimum at a rate independent of the minimum eigenvalue of the neural tangent kernel and, locally, of the modulus of strong convexity of the loss. These results demonstrate that Gauss-Newton achieves accelerated convergence rates in settings where first-order methods exhibit slow convergence due to ill-conditioned kernel matrices and loss landscapes.

View on arXiv
Comments on this paper