Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

11 December 2024

Peiyuan Zhang

Amin Karbasi

ArXiv (abs)PDF HTML Github

Main:9 Pages

10 Figures

Bibliography:3 Pages

Appendix:19 Pages

Abstract

Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size $\eta$ exceeds the threshold of $2/L$ , where $L$ is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator $\beta \in R^{d}$ for regression with loss function $l(\cdot)$ , where $\beta$ admits parameterization as $\beta = w^2_{+} - w^2_{-}$ . Contrary to the previous work that suggests a subquadratic $l$ is necessary for EoS, our novel finding reveals that EoS occurs even when $l$ is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic $l$ , also known as a depth- $2$ diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

View on arXiv

Comments on this paper