Numerical Error Analysis of Large Language Models

13 March 2025

Abstract

Large language models based on transformer architectures have become integral to state-of-the-art natural language processing applications. However, their training remains computationally expensive and exhibits instabilities, some of which are expected to be caused by finite-precision computations. We provide a theoretical analysis of the impact of round-off errors within the forward pass of a transformer architecture which yields fundamental bounds for these effects. In addition, we conduct a series of numerical experiments which demonstrate the practical relevance of our bounds. Our results yield concrete guidelines for choosing hyperparameters that mitigate round-off errors, leading to more robust and stable inference.

View on arXiv

@article{budzinskiy2025_2503.10251,
  title={ Numerical Error Analysis of Large Language Models },
  author={ Stanislav Budzinskiy and Wenyi Fang and Longbin Zeng and Philipp Petersen },
  journal={arXiv preprint arXiv:2503.10251},
  year={ 2025 }
}

Comments on this paper