Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment

13 June 2025

Main:6 Pages

2 Figures

Bibliography:2 Pages

Abstract

Speech quality assessment (SQA) aims to predict the perceived quality of speech signals under a wide range of distortions. It is inherently connected to speech enhancement (SE), which seeks to improve speech quality by removing unwanted signal components. While SQA models are widely used to evaluate SE performance, their potential to guide SE training remains underexplored. In this work, we investigate a training framework that leverages a SQA model, trained to predict multiple evaluation metrics from a public SE leaderboard, as a supervisory signal for SE. This approach addresses a key limitation of conventional SE objectives, such as SI-SNR, which often fail to align with perceptual quality and generalize poorly across evaluation metrics. Moreover, it enables training on real-world data where clean references are unavailable. Experiments on both simulated and real-world test sets show that SQA-guided training consistently improves performance across a range of quality metrics.

View on arXiv

@article{wang2025_2506.12260,
  title={ Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment },
  author={ Wei Wang and Wangyou Zhang and Chenda Li and Jiatong Shi and Shinji Watanabe and Yanmin Qian },
  journal={arXiv preprint arXiv:2506.12260},
  year={ 2025 }
}

Comments on this paper