108

Softmax is 1/21/2-Lipschitz: A tight bound across all p\ell_p norms

Main:9 Pages
5 Figures
Bibliography:3 Pages
Appendix:5 Pages
Abstract

The softmax function is a basic operator in machine learning and optimization, used in classification, attention mechanisms, reinforcement learning, game theory, and problems involving log-sum-exp terms. Existing robustness guarantees of learning models and convergence analysis of optimization algorithms typically consider the softmax operator to have a Lipschitz constant of 11 with respect to the 2\ell_2 norm. In this work, we prove that the softmax function is contractive with the Lipschitz constant 1/21/2, uniformly across all p\ell_p norms with p1p \ge 1. We also show that the local Lipschitz constant of softmax attains 1/21/2 for p=1p = 1 and p=p = \infty, and for p(1,)p \in (1,\infty), the constant remains strictly below 1/21/2 and the supremum 1/21/2 is achieved only in the limit. To our knowledge, this is the first comprehensive norm-uniform analysis of softmax Lipschitz continuity. We demonstrate how the sharper constant directly improves a range of existing theoretical results on robustness and convergence. We further validate the sharpness of the 1/21/2 Lipschitz constant of the softmax operator through empirical studies on attention-based architectures (ViT, GPT-2, Qwen3-8B) and on stochastic policies in reinforcement learning.

View on arXiv
Comments on this paper