Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06909
Cited By
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
13 June 2021
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
Chao Weng
Dan Su
Daniel Povey
J. Trmal
Junbo Zhang
Mingjie Jin
Sanjeev Khudanpur
Shinji Watanabe
Shuaijiang Zhao
Wei Zou
Xiangang Li
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio"
50 / 257 papers shown
Title
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
26
36
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
E. Chng
21
42
0
27 Sep 2023
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
Chao-Han Huck Yang
Yile Gu
Yi-Chieh Liu
Shalini Ghosh
I. Bulyko
A. Stolcke
KELM
LRM
33
40
0
27 Sep 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
12
7
0
26 Sep 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
21
4
0
26 Sep 2023
Connecting Speech Encoder and Large Language Model for ASR
Wenyi Yu
Changli Tang
Guangzhi Sun
Xianzhao Chen
T. Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
AuLLM
6
64
0
25 Sep 2023
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
31
35
0
25 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
16
9
0
24 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
21
15
0
19 Sep 2023
Improved Factorized Neural Transducer Model For text-only Domain Adaptation
J. Liu
Jianwei Yu
Xie Chen
16
1
0
18 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
H. M. Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
29
3
0
16 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
21
3
0
15 Sep 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Wei Kang
Xiaoyu Yang
Zengwei Yao
Fangjun Kuang
Yifan Yang
Liyong Guo
Long Lin
Daniel Povey
14
43
0
15 Sep 2023
Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Peng Wang
Yifan Yang
Zheng Liang
Tian Tan
Shiliang Zhang
Xie Chen
12
0
0
14 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
24
54
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
30
24
0
14 Sep 2023
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
29
11
0
11 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
28
35
0
02 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
26
2
0
28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
37
0
24 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
25
221
0
10 Aug 2023
Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Gege Qi
YueFeng Chen
Xiaofeng Mao
Xiaojun Jia
Ranjie Duan
Rong Zhang
Hui Xue
VLM
AAML
25
0
0
24 Jul 2023
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning
F. Liao
Yung-Chieh Chan
Yi-Chang Chen
Chan-Jan Hsu
Da-shan Shiu
30
6
0
18 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
42
5
0
11 Jul 2023
The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task
Kun Song
Yinjiao Lei
Pei-Ning Chen
Yiqing Cao
Kun Wei
Yongmao Zhang
Linfu Xie
Ning Jiang
Guoqing Zhao
19
1
0
10 Jul 2023
What Do Self-Supervised Speech Models Know About Words?
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
30
26
0
30 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
29
3
0
27 Jun 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
16
46
0
26 Jun 2023
Implementing contextual biasing in GPU decoder for online ASR
Iuliia Nigmatulina
S. Madikeri
Esaú Villatoro-Tello
P. Motlícek
Juan Pablo Zuluaga
Karthik Pandia
A. Ganapathiraju
AI4CE
26
2
0
23 Jun 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems
Mingyu Cui
Jiawen Kang
Jiajun Deng
Xiaoyue Yin
Yutao Xie
Xie Chen
Xunying Liu
16
8
0
23 Jun 2023
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Yuya Yamamoto
25
2
0
22 Jun 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
Zheng Liang
Zheshu Song
Ziyang Ma
Chenpeng Du
K. Yu
Xie Chen
30
5
0
14 Jun 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
William Chen
Xuankai Chang
Yifan Peng
Zhaoheng Ni
Soumi Maiti
Shinji Watanabe
SSL
21
25
0
11 Jun 2023
WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span Prediction
Qiyu Wu
Masaaki Nagata
Yoshimasa Tsuruoka
19
5
0
09 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
32
73
0
06 Jun 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
40
4
0
06 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong
Zhiying Huang
Qiao Tian
Chen Xu
Tom Ko
...
Lu Lu
Zejun Ma
Yuping Wang
Mingxuan Wang
Yuxuan Wang
20
23
0
05 Jun 2023
DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
Haoyu Wang
Siyuan Wang
Weiqiang Zhang
Jinfeng Bai
32
2
0
02 Jun 2023
Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Tianyi Xu
Zhanheng Yang
Kaixun Huang
Pengcheng Guo
Aoting Zhang
Biao Li
Changru Chen
C. Li
Linfu Xie
14
10
0
01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
16
6
0
30 May 2023
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang
Brian Yan
Yuya Fujita
Takashi Maekaku
Shinji Watanabe
19
37
0
29 May 2023
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Aoi Ito
Shota Horiguchi
SSL
14
2
0
24 May 2023
On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Vamsikrishna Chemudupati
Marzieh S. Tahaei
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
SSL
55
7
0
23 May 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
19
16
0
21 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
11
23
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
19
17
0
18 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
43
287
0
18 May 2023
Considerations for Ethical Speech Recognition Datasets
Avijoy Chakma
Zahid Hasan
13
2
0
03 May 2023
DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu
Yueteng Kang
Songjun Cao
Long Ma
6
7
0
16 Mar 2023
Visual Information Matters for ASR Error Correction
Bannihati Kumar Vanya
Shanbo Cheng
Ningxin Peng
Yuchen Zhang
16
3
0
16 Mar 2023
Previous
1
2
3
4
5
6
Next