83

Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis

Seunghoon Paik
Kangjie Zhou
Matus Telgarsky
Ryan J. Tibshirani
Main:24 Pages
6 Figures
Bibliography:4 Pages
4 Tables
Appendix:19 Pages
Abstract

We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While related inequalities appear in the literature, we isolate and highlight a specific form and develop it as a well-rounded tool for statistical analysis. Let ff denote the objective function to be optimized. Given a first-order iterative algorithm initialized at θ0\theta_0 with current iterate θT\theta_T, the basic inequality upper bounds f(θT)f(z)f(\theta_T)-f(z) for any reference point zz in terms of the accumulated step sizes and the distances between θ0\theta_0, θT\theta_T, and zz. The bound translates the number of iterations into an effective regularization coefficient in the loss function. We demonstrate this framework through analyses of training dynamics and prediction risk bounds. In addition to revisiting and refining known results on gradient descent, we provide new results for mirror descent with Bregman divergence projection, for generalized linear models trained by gradient descent and exponentiated gradient descent, and for randomized predictors. We illustrate and supplement these theoretical findings with experiments on generalized linear models.

View on arXiv
Comments on this paper