Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Abstract
The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.
View on arXiv@article{conde2025_2502.16721, title={ Speed and Conversational Large Language Models: Not All Is About Tokens per Second }, author={ Javier Conde and Miguel González and Pedro Reviriego and Zhen Gao and Shanshan Liu and Fabrizio Lombardi }, journal={arXiv preprint arXiv:2502.16721}, year={ 2025 } }
Comments on this paper