Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06909
Cited By
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
13 June 2021
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
Chao Weng
Dan Su
Daniel Povey
J. Trmal
Junbo Zhang
Mingjie Jin
Sanjeev Khudanpur
Shinji Watanabe
Shuaijiang Zhao
Wei Zou
Xiangang Li
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio"
50 / 257 papers shown
Title
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
43
0
0
08 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
R. Ji
Xing Sun
32
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
34
0
0
01 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
19
0
0
30 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
S. Liu
...
Z. Yang
Aoxiong Yin
Ruibin Yuan
Y. Zhang
Zaida Zhou
AuLLM
VLM
108
5
0
25 Apr 2025
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Mahmoud Salhab
Marwan Elghitany
Shameed Sait
Syed Sibghat Ullah
Mohammad Abusheikh
Hasan Abusheikh
44
0
0
16 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
23
0
0
14 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
54
0
0
11 Apr 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
Siyin Wang
Wenyi Yu
Xianzhao Chen
Xiaohai Tian
J. Zhang
Lu Lu
Yu Tsao
Junichi Yamagishi
Y. Wang
Chao Zhang
AuLLM
76
0
0
26 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
93
0
0
26 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
47
0
0
13 Mar 2025
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
Henglyu Liu
Andong Chen
Kehai Chen
X. Bai
M. Zhong
Yuan Qiu
Min Zhang
40
0
0
13 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
61
1
0
06 Mar 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Dingdong Wang
Jin Xu
Ruihang Chu
Zhifang Guo
X. Wang
Jincenzi Wu
Dongchao Yang
Shengpeng Ji
Junyang Lin
AuLLM
83
0
0
04 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
X. Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
73
10
0
03 Mar 2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Rui Hu
Delai Qiu
Shuyu Wei
J. Zhang
Yining Wang
Shengping Liu
Jitao Sang
AuLLM
VLM
48
0
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Y. Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
64
0
0
26 Feb 2025
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
Maike Züfle
Sara Papi
Beatrice Savoldi
Marco Gaido
L. Bentivogli
Jan Niehues
38
0
0
24 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
91
0
0
21 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Y. Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
H. Li
AuLLM
SyDa
VLM
100
0
0
18 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
51
0
0
17 Feb 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
T. Zhao
M. Yang
Mamba
AuLLM
63
0
0
16 Feb 2025
Evaluation of Deep Audio Representations for Hearables
Fabian Gröger
Pascal Baumann
L. Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
77
0
0
10 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Yuexian Zou
83
2
0
10 Feb 2025
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
58
0
0
01 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
54
4
0
28 Jan 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
...
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
50
5
0
28 Jan 2025
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
Guodong Ma
Wenxuan Wang
Lifeng Zhou
Yuting Yang
Yuke Li
Binbin Du
MoE
72
0
0
22 Jan 2025
Integrating Pause Information with Word Embeddings in Language Models for Alzheimer's Disease Detection from Spontaneous Speech
Yu Pu
Wei-Qiang Zhang
31
0
0
12 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
64
7
0
10 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
82
3
0
03 Jan 2025
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
36
1
0
23 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
X. Zhang
K. Chen
Yu Qiao
D. Lin
Jiaqi Wang
KELM
84
12
0
12 Dec 2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
J. Zhang
Guangzhi Sun
Lu Lu
Y. Wang
Chao Zhang
AuLLM
72
6
0
27 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
37
0
0
11 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
115
2
0
11 Nov 2024
Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO
Macarious Hui
Jinda Zhang
Aanchan Mohan
26
0
0
01 Nov 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
23
0
0
23 Oct 2024
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Nat Jeffries
Evan King
M. Kudlur
Guy Nicholson
James Wang
Pete Warden
34
5
0
21 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
16
0
0
18 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Y. Kawaguchi
26
0
0
17 Oct 2024
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
24
0
0
16 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
26
1
0
15 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
14
4
0
12 Oct 2024
Self-Powered LLM Modality Expansion for Large Speech-Text Models
Tengfei Yu
Xuebo Liu
Zhiyi Hou
Liang Ding
Dacheng Tao
Min Zhang
32
0
0
04 Oct 2024
Reverb: Open-Source ASR and Diarization from Rev
Nishchal Bhandari
Danny Chen
Miguel Ángel del Río Fernández
Natalie Delworth
Jennifer Drexler Fox
...
Ondrej Novotný
Jan Profant
Nan Qin
Martin Ratajczak
Jean-Philippe Robichaud
VLM
31
1
0
04 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
36
8
0
03 Oct 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Marco Gaido
Sara Papi
L. Bentivogli
A. Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
39
0
0
01 Oct 2024
1
2
3
4
5
6
Next