ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16168
47
0

Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty

22 May 2025
Hongfei Xue
Yufeng Tang
Jun Zhang
Xuelong Geng
Lei Xie
ArXiv (abs)PDFHTML
Main:4 Pages
3 Figures
Bibliography:1 Pages
7 Tables
Abstract

Although multilingual automatic speech recognition (ASR) systems have significantly advanced, enabling a single model to handle multiple languages, inherent linguistic differences and data imbalances challenge SOTA performance across all languages. While language identification (LID) models can route speech to the appropriate ASR model, they incur high costs from invoking SOTA commercial models and suffer from inaccuracies due to misclassification. To overcome these, we propose SIMA, a selective invocation for multilingual ASR that adapts to the difficulty level of the input speech. Built on a spoken large language model (SLLM), SIMA evaluates whether the input is simple enough for direct transcription or requires the invocation of a SOTA ASR model. Our approach reduces word error rates by 18.7% compared to the SLLM and halves invocation costs compared to LID-based methods. Tests on three datasets show that SIMA is a scalable, cost-effective solution for multilingual ASR applications.

View on arXiv
@article{xue2025_2505.16168,
  title={ Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty },
  author={ Hongfei Xue and Yufeng Tang and Jun Zhang and Xuelong Geng and Lei Xie },
  journal={arXiv preprint arXiv:2505.16168},
  year={ 2025 }
}
Comments on this paper