Speed and Conversational Large Language Models: Not All Is About Tokens per Second

23 February 2025

Abstract

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

View on arXiv

@article{conde2025_2502.16721,
  title={ Speed and Conversational Large Language Models: Not All Is About Tokens per Second },
  author={ Javier Conde and Miguel González and Pedro Reviriego and Zhen Gao and Shanshan Liu and Fabrizio Lombardi },
  journal={arXiv preprint arXiv:2502.16721},
  year={ 2025 }
}

Comments on this paper