218

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Main:5 Pages
6 Figures
Abstract

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

View on arXiv
Comments on this paper