SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant

3 June 2025

Main:4 Pages

3 Figures

Bibliography:1 Pages

4 Tables

Abstract

Thanks to the steady progress of large language models (LLMs), speech encoding algorithms and vocoder structure, recent advancements have enabled generating speech response directly from a user instruction. However, benchmarking the generated speech quality has been a neglected but critical issue, considering the shift from the pursuit of semantic accuracy to vivid and spontaneous speech flow. Previous evaluation focused on the speech-understanding ability, lacking a quantification of acoustic quality. In this paper, we propose Speech cOnversational Voice Assistant Benchmark (SOVA-Bench), providing a comprehension comparison of the general knowledge, speech recognition and understanding, along with both semantic and acoustic generative ability between available speech LLMs. To the best of our knowledge, SOVA-Bench is one of the most systematic evaluation frameworks for speech LLMs, inspiring the direction of voice interaction systems.

View on arXiv

@article{hou2025_2506.02457,
  title={ SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant },
  author={ Yixuan Hou and Heyang Liu and Yuhao Wang and Ziyang Cheng and Ronghua Wu and Qunshan Gu and Yanfeng Wang and Yu Wang },
  journal={arXiv preprint arXiv:2506.02457},
  year={ 2025 }
}

Comments on this paper