ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.13582
14
3

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection

20 September 2024
Xuanru Zhou
Jiachen Lian
Cheol Jun Cho
Jingwen Liu
Zongli Ye
Jinming Zhang
Brittany Morin
D. Baquirin
Jet M J Vonk
Z. Ezzes
Zachary Miller
M. G. Tempini
Gopala Anumanchipalli
ArXivPDFHTML
Abstract

Speech dysfluency modeling is a task to detect dysfluencies in speech, such as repetition, block, insertion, replacement, and deletion. Most recent advancements treat this problem as a time-based object detection problem. In this work, we revisit this problem from a new perspective: tokenizing dysfluencies and modeling the detection problem as a token-based automatic speech recognition (ASR) problem. We propose rule-based speech and text dysfluency simulators and develop VCTK-token, and then develop a Whisper-like seq2seq architecture to build a new benchmark with decent performance. We also systematically compare our proposed token-based methods with time-based methods, and propose a unified benchmark to facilitate future research endeavors. We open-source these resources for the broader scientific community. The project page is available at https://rorizzz.github.io/

View on arXiv
Comments on this paper