Generalized Gradient Norm Clipping & Non-Euclidean -Smoothness

Main:9 Pages
8 Figures
Bibliography:3 Pages
3 Tables
Appendix:17 Pages
Abstract
This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of (,)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning and demonstrate their properties on image classification and language modeling.
View on arXiv@article{pethick2025_2506.01913, title={ Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness }, author={ Thomas Pethick and Wanyun Xie and Mete Erdogan and Kimon Antonakopoulos and Tony Silveti-Falls and Volkan Cevher }, journal={arXiv preprint arXiv:2506.01913}, year={ 2025 } }
Comments on this paper