ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.06660
7
0

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

10 May 2025
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
    ELM
ArXivPDFHTML
Abstract

Self-supervised learning (SSL) models have significantly advanced speech processing tasks, and several benchmarks have been proposed to validate their effectiveness. However, previous benchmarks have primarily focused on single-speaker scenarios, with less exploration of target-speaker tasks in noisy, multi-talker conditions -- a more challenging yet practical case. In this paper, we introduce the Target-Speaker Speech Processing Universal Performance Benchmark (TS-SUPERB), which includes four widely recognized target-speaker processing tasks that require identifying the target speaker and extracting information from the speech mixture. In our benchmark, the speaker embedding extracted from enrollment speech is used as a clue to condition downstream models. The benchmark result reveals the importance of evaluating SSL models in target speaker scenarios, demonstrating that performance cannot be easily inferred from related single-speaker tasks. Moreover, by using a unified SSL-based target speech encoder, consisting of a speaker encoder and an extractor module, we also investigate joint optimization across TS tasks to leverage mutual information and demonstrate its effectiveness.

View on arXiv
@article{peng2025_2505.06660,
  title={ TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models },
  author={ Junyi Peng and Takanori Ashihara and Marc Delcroix and Tsubasa Ochiai and Oldrich Plchot and Shoko Araki and Jan Černocký },
  journal={arXiv preprint arXiv:2505.06660},
  year={ 2025 }
}
Comments on this paper