SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
- AuLLM

Thanks to the steady progress of large language models (LLMs), speech encoding algorithms and vocoder structure, recent advancements have enabled generating speech response directly from a user instruction. However, benchmarking the generated speech quality has been a neglected but critical issue, considering the shift from the pursuit of semantic accuracy to vivid and spontaneous speech flow. Previous evaluation focused on the speech-understanding ability, lacking a quantification of acoustic quality. In this paper, we propose Speech cOnversational Voice Assistant Benchmark (SOVA-Bench), providing a comprehension comparison of the general knowledge, speech recognition and understanding, along with both semantic and acoustic generative ability between available speech LLMs. To the best of our knowledge, SOVA-Bench is one of the most systematic evaluation frameworks for speech LLMs, inspiring the direction of voice interaction systems.
View on arXiv@article{hou2025_2506.02457, title={ SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant }, author={ Yixuan Hou and Heyang Liu and Yuhao Wang and Ziyang Cheng and Ronghua Wu and Qunshan Gu and Yanfeng Wang and Yu Wang }, journal={arXiv preprint arXiv:2506.02457}, year={ 2025 } }