ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.09785
24
4

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

15 September 2024
Chao-Han Huck Yang
Taejin Park
Yuan Gong
Yuanchao Li
Zhehuai Chen
Yen-Ting Lin
Chen Chen
Yuchen Hu
Kunal Dhawan
Piotr Żelasko
Chao Zhang
Yun-Nung Chen
Yu Tsao
Jagadeesh Balam
Boris Ginsburg
Sabato Marco Siniscalchi
E. Chng
Peter Bell
Catherine Lai
Shinji Watanabe
A. Stolcke
    AuLLM
    ELM
ArXivPDFHTML
Abstract

Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new capabilities in language modeling for speech processing, we introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. These tasks aim to emulate future LLM-based agents handling voice-based interfaces while remaining accessible to a broad audience by utilizing open pretrained language models or agent-based APIs. We also discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.

View on arXiv
Comments on this paper