Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.03411
Cited By
v1
v2 (latest)
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"MLS: A Large-Scale Multilingual Dataset for Speech Research"
50 / 390 papers shown
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David Harwath
Eunsol Choi
CLIP
VLM
401
14
0
06 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xiang Wang
Mingqi Jiang
Tianhao Shen
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Xu Tan
Wei Xue
306
110
0
03 Mar 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
230
4
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
630
12
0
26 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
Junwei Liao
Haipang Wu
Ji Liu
André Freitas
Qifan Wang
AuLLM
599
7
0
26 Feb 2025
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Wenhan Luo
Wei Xue
MLLM
AuLLM
CLIP
VLM
298
2
0
23 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
368
0
0
21 Feb 2025
Adopting Whisper for Confidence Estimation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Vaibhav Aggarwal
Shabari S Nair
Yash Verma
Yash Jogi
254
2
0
20 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLM
SyDa
VLM
282
7
0
18 Feb 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
Tiejun Zhao
M. Yang
Mamba
AuLLM
303
4
0
16 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Chun-Yi Kuan
Hung-yi Lee
472
3
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
390
19
0
07 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Longji Xu
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
372
16
0
27 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
IEEE Access (IEEE Access), 2025
Marco Giordano
Claudia Rinaldi
267
1
0
11 Jan 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
271
3
0
08 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
AAAI Conference on Artificial Intelligence (AAAI), 2024
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
Ran Xu
Han Wang
Caiming Xiong
Siyang Song
DiffM
353
0
0
03 Jan 2025
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
Keqi Deng
Jinxi Guo
Yingyi Ma
Niko Moritz
P. Woodland
Ozlem Kalinli
M. Seltzer
AuLLM
230
2
0
21 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
AAAI Conference on Artificial Intelligence (AAAI), 2024
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
474
1
0
19 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
316
55
0
29 Nov 2024
ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024
Zixing Zhang
Weixiang Xu
Zhongren Dong
Kanglin Wang
Yimeng Wu
Jing Peng
Runming Wang
Dong-Yan Huang
105
8
0
14 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
356
27
0
04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
317
5
0
31 Oct 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
133
0
0
30 Oct 2024
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams
Srija Anand
Praveen Srinivasa Varadhan
Mehak Singal
Mitesh M. Khapra
182
2
0
23 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDa
BDL
AuLLM
VLM
350
39
0
23 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yifan Peng
Krishna Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
422
8
0
23 Oct 2024
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Nat Jeffries
Evan King
M. Kudlur
Guy Nicholson
James Wang
Pete Warden
150
13
0
21 Oct 2024
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh
Tim Thiele
Frederic Lorbeer
Frank Dreyer
Sebastian Stober
190
0
0
20 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
353
6
0
20 Oct 2024
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Abhishek Gupta
Amruta Parulekar
Sameep Chattopadhyay
Preethi Jyothi
VLM
144
0
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
365
2
0
17 Oct 2024
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
Christoph Minixhofer
Ondˇrej Klejch
Peter Bell
244
0
0
16 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Jiaqi Leng
AuLLM
327
15
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
International Conference on Learning Representations (ICLR), 2024
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
288
14
0
09 Oct 2024
A Two-Step Approach for Data-Efficient French Pronunciation Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Hoyeon Lee
Hyeeun Jang
Jong-Hwan Kim
Jae-Min Kim
67
0
0
08 Oct 2024
Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations
Qilong Pan
Sameh Abdulah
M. Genton
Ying Sun
192
0
0
06 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
327
28
0
03 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
International Conference on Learning Representations (ICLR), 2024
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
975
0
0
03 Oct 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
195
12
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
565
64
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
International Workshop on Spoken Language Translation (IWSLT), 2024
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
426
11
0
30 Sep 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
236
5
0
28 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
184
1
0
25 Sep 2024
Speech Recognition Rescoring with Large Speech-Text Foundation Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Prashanth Gurunath Shivakumar
J. Kolehmainen
Aditya Gourav
Yi Gu
Ankur Gandhe
Ariya Rastrow
I. Bulyko
AuLLM
251
2
0
25 Sep 2024
Revisiting Acoustic Features for Robust ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Muhammad Ahmed Shah
Bhiksha Raj
AAML
179
0
0
24 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
273
14
0
23 Sep 2024
Semi-supervised Learning For Robust Speech Evaluation
Spoken Language Technology Workshop (SLT), 2024
Huayun Zhang
Jeremy H. M. Wong
Geyu Lin
Nancy F. Chen
180
0
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Computer Science Review (CSR), 2024
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
522
15
0
23 Sep 2024
Preference Alignment Improves Language Model-Based TTS
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
238
22
0
19 Sep 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
172
14
0
18 Sep 2024
Previous
1
2
3
4
5
6
7
8
Next
Page 3 of 8