ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06909
  4. Cited By
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

13 June 2021
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
Chao Weng
Dan Su
Daniel Povey
J. Trmal
Junbo Zhang
Mingjie Jin
Sanjeev Khudanpur
Shinji Watanabe
Shuaijiang Zhao
Wei Zou
Xiangang Li
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
ArXivPDFHTML

Papers citing "GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio"

50 / 257 papers shown
Title
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
62
8
0
10 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
36
2
0
09 Jun 2024
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability
  of Whisper
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
Chih-Kai Yang
Kuan Po Huang
Hung-yi Lee
40
3
0
09 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust
  Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
41
0
0
06 Jun 2024
BLSP-Emo: Towards Empathetic Large Speech-Language Models
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Chen Wang
Minpeng Liao
Zhongqiang Huang
Junhong Wu
Chengqing Zong
Jiajun Zhang
VLM
AuLLM
38
4
0
06 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
36
10
0
02 Jun 2024
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge
  Distillation
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jiajun Zhang
ALM
AuLLM
40
4
0
29 May 2024
ViSpeR: Multilingual Audio-Visual Speech Recognition
ViSpeR: Multilingual Audio-Visual Speech Recognition
Sanath Narayan
Y. A. D. Djilali
Ankit Singh
Eustache Le Bihan
Hakim Hacid
VLM
25
0
0
27 May 2024
MELD-ST: An Emotion-aware Speech Translation Dataset
MELD-ST: An Emotion-aware Speech Translation Dataset
Sirou Chen
Sakiko Yahata
Shuichiro Shimizu
Zhengdong Yang
Yihang Li
Chenhui Chu
Sadao Kurohashi
11
1
0
21 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
32
1
0
21 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra
Alkis Koudounas
Lorenzo Vaiani
Elena Baralis
Luca Cagliero
Paolo Garza
Sabato Marco Siniscalchi
32
10
0
02 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
32
18
0
30 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain
Khai Le-Duc
LM&MA
36
8
0
08 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
65
39
0
03 Apr 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David F. Harwath
66
57
0
25 Mar 2024
M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual
  Academic Lecture Dataset
M3^33AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset
Zhe Chen
Heyang Liu
Wenyi Yu
Guangzhi Sun
Hongcheng Liu
Ji Wu
Chao Zhang
Yu Wang
Yanfeng Wang
VGen
36
1
0
21 Mar 2024
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
21
6
0
20 Mar 2024
Building speech corpus with diverse voice characteristics for its
  prompt-based representation
Building speech corpus with diverse voice characteristics for its prompt-based representation
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
27
0
0
20 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
35
3
0
13 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn
  Medical Interview
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
25
0
0
01 Mar 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech
  Representation Learning
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
30
3
0
21 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
35
17
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
27
114
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
41
12
0
19 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
47
38
0
13 Feb 2024
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits
Hiroyuki Namba
Shota Horiguchi
Masaki Hamamoto
Masashi Egi
TDI
11
0
0
13 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative
  Training for Unsupervised ASR
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
59
1
0
06 Feb 2024
Can you Remove the Downstream Model for Speaker Recognition with
  Self-Supervised Speech Features?
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Skyler Seto
Tatiana Likhomanenko
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
B. Theobald
SSL
34
2
0
01 Feb 2024
Exploring the limits of decoder-only models trained on public speech
  recognition corpora
Exploring the limits of decoder-only models trained on public speech recognition corpora
Ankit Gupta
G. Saon
Brian Kingsbury
OffRL
23
5
0
31 Jan 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on
  E-Branchformer
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng
Jinchuan Tian
William Chen
Siddhant Arora
Brian Yan
...
Kwanghee Choi
Jiatong Shi
Xuankai Chang
Jee-weon Jung
Shinji Watanabe
VLM
OSLM
26
40
0
30 Jan 2024
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
Dong Zhang
Xin Zhang
Jun Zhan
Shimin Li
Yaqian Zhou
Xipeng Qiu
AuLLM
BDL
40
16
0
24 Jan 2024
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
Yichao Du
Zhirui Zhang
Linan Yue
Xu Huang
Yuqing Zhang
Tong Bill Xu
Linli Xu
Enhong Chen
FedML
54
5
0
18 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
24
7
0
05 Jan 2024
Investigating Zero-Shot Generalizability on Mandarin-English
  Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models
  with Self-Supervision and Weak Supervision
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision
Chih-Kai Yang
Kuan-Po Huang
Ke-Han Lu
Chun-Yi Kuan
Chi-Yuan Hsiao
Hung-yi Lee
48
7
0
30 Dec 2023
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
23
10
0
27 Dec 2023
Fine-Tuned Self-Supervised Speech Representations for Language
  Diarization in Multilingual Code-Switched Speech
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech
Geoffrey T. Frost
Emily Morris
Joshua Jansen van Vüren
T. Niesler
26
2
0
15 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
23
3
0
14 Dec 2023
FastInject: Injecting Unpaired Text Data into CTC-based ASR training
FastInject: Injecting Unpaired Text Data into CTC-based ASR training
Keqi Deng
Phil Woodland
11
2
0
14 Dec 2023
End-to-End Speech-to-Text Translation: A Survey
End-to-End Speech-to-Text Translation: A Survey
Nivedita Sethiya
Chandresh Kumar Maurya
19
7
0
02 Dec 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
14
51
0
01 Nov 2023
Pre-trained Speech Processing Models Contain Human-Like Biases that
  Propagate to Speech Emotion Recognition
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
Isaac Slaughter
Craig Greenberg
Reva Schwartz
Aylin Caliskan
22
4
0
29 Oct 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
35
199
0
20 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
29
16
0
11 Oct 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
  for Speech Recognition
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
S. Radhakrishnan
Chao-Han Huck Yang
S. Khan
Rohit Kumar
N. Kiani
D. Gómez-Cabrero
Jesper N. Tegnér
38
47
0
10 Oct 2023
SALT: Distinguishable Speaker Anonymization Through Latent Space
  Transformation
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
Yuanjun Lv
Jixun Yao
Peikun Chen
Hongbin Zhou
Heng Lu
Lei Xie
25
4
0
08 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
34
79
0
07 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
21
4
0
01 Oct 2023
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and
  General Domain ASR
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Tobi Olatunji
Tejumade Afonja
Aditya Yadavalli
Chris C. Emezue
Sahib Singh
...
Joanne I. Osuchukwu
Salomey Osei
A. Tonja
Naome A. Etori
Clinton Mbataku
22
14
0
30 Sep 2023
Previous
123456
Next