Building English ASR model with regional language support

10 March 2025

Abstract

In this paper, we present a novel approach to developing an English Automatic Speech Recognition (ASR) system that can effectively handle Hindi queries, without compromising its performance on English. We propose a novel acoustic model (AM), referred to as SplitHead with Attention (SHA) model, features shared hidden layers across languages and language-specific projection layers combined via a self-attention mechanism. This mechanism estimates the weight for each language based on input data and weighs the corresponding language-specific projection layers accordingly. Additionally, we propose a language modeling approach that interpolates n-gram models from both English and transliterated Hindi text corpora. Our results demonstrate the effectiveness of our approach, with a 69.3% and 5.7% relative reduction in word error rate on Hindi and English test sets respectively when compared to a monolingual English model.

View on arXiv

@article{agrawal2025_2503.07522,
  title={ Building English ASR model with regional language support },
  author={ Purvi Agrawal and Vikas Joshi and Bharati Patidar and Ankur Gupta and Rupesh Kumar Mehta },
  journal={arXiv preprint arXiv:2503.07522},
  year={ 2025 }
}

Comments on this paper