121

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Main:9 Pages
10 Figures
Bibliography:3 Pages
Appendix:19 Pages
Abstract

Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size η\eta exceeds the threshold of 2/L2/L, where LL is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator βRd\beta \in R^{d} for regression with loss function l()l(\cdot), where β\beta admits parameterization as β=w+2w2\beta = w^2_{+} - w^2_{-}. Contrary to the previous work that suggests a subquadratic ll is necessary for EoS, our novel finding reveals that EoS occurs even when ll is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic ll, also known as a depth-22 diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

View on arXiv
Comments on this paper