Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$ -Smoothness

2 June 2025

Main:9 Pages

8 Figures

Bibliography:3 Pages

3 Tables

Appendix:17 Pages

Abstract

This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of ( $L_0$ , $L_1$ )-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal $O(n^{-1/4})$ convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.

View on arXiv

@article{pethick2025_2506.01913,
  title={ Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness },
  author={ Thomas Pethick and Wanyun Xie and Mete Erdogan and Kimon Antonakopoulos and Tony Silveti-Falls and Volkan Cevher },
  journal={arXiv preprint arXiv:2506.01913},
  year={ 2025 }
}

Comments on this paper

Generalized Gradient Norm Clipping & Non-Euclidean (L0,L1)(L_0,L_1)(L0​,L1​)-Smoothness

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$ -Smoothness