Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.03411
Cited By
v1
v2 (latest)
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"MLS: A Large-Scale Multilingual Dataset for Speech Research"
50 / 390 papers shown
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le
Yinfeng Xia
Huiyan Li
Manhong Wang
Yutao Sun
Xingyang Ma
Yanmin Qian
88
0
0
15 Aug 2025
M
3
PDB
\text{M}^3\text{PDB}
M
3
PDB
: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
B. Zhu
Cheng Gong
Muyang Wu
Ruihao Jing
Fan Liu
Xiaolei Zhang
Chi Zhang
Xuelong Li
118
0
0
13 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
238
4
0
12 Aug 2025
Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models
Wenze Xu
Chun Wang
Jiazhen Yu
Sheng Chen
Liang Gao
Weihong Deng
OT
200
1
0
11 Aug 2025
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
Nameer Hirschkind
Joseph Liu
Xiao Yu
Xiao Yu
167
0
0
07 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLM
AI4TS
VLM
437
18
0
06 Aug 2025
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Xi Xuan
Yang Xiao
Rohan Kumar Das
Tomi Kinnunen
168
3
0
06 Aug 2025
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft
George Close
Kit Bower-Morris
Jamie Stacey
Dmitry Sityaev
Kris Y. Hong
215
1
0
29 Jul 2025
Binaural Target Speaker Extraction using HRTFs
Yoav Ellinson
Sharon Gannot
168
0
0
25 Jul 2025
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
Hongfei Xue
Kaixun Huang
Zhikai Zhou
Shen Huang
Shidong Shang
126
2
0
24 Jul 2025
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Miaomiao Gao
Xiaoxiao Xiang
Yiwen Guo
AILaw
166
1
0
23 Jul 2025
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
Shoutao Guo
Shaolei Zhang
Qingkai Fang
Zhengrui Ma
Min Zhang
Yang Feng
AuLLM
248
2
0
20 Jul 2025
Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model
Philippe Gonzalez
Torsten Dau
Tobias May
189
1
0
12 Jul 2025
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
He Wang
Linhan Ma
Dake Guo
Xiong Wang
Lei Xie
Jin Xu
Junyang Lin
AuLLM
268
5
0
08 Jul 2025
USAD: Universal Speech and Audio Representation via Distillation
Heng-Jui Chang
Saurabhchand Bhati
James R. Glass
Alexander H. Liu
325
3
0
23 Jun 2025
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
International Workshop on Spoken Language Translation (IWSLT), 2025
Giuseppe Attanasio
Sonal Sannigrahi
Ben Peters
Marcely Zanon Boito
AuLLM
190
0
0
20 Jun 2025
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan
Ngoc-Quan Pham
Alexander Waibel
CLL
MoMe
158
2
0
19 Jun 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
164
1
0
18 Jun 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
Yizhou Peng
Bin Wang
Yi-Wen Chao
Ziyang Ma
Haoyang Zhang
Hexin Liu
Xie Chen
Eng Siong Chng
ELM
241
1
0
16 Jun 2025
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
Ariadna Sanchez
Simon King
116
1
0
04 Jun 2025
The mutual exclusivity bias of bilingual visually grounded speech models
Dan Oneaţă
Leanne Nortje
Yevgen Matusevych
Herman Kamper
152
0
0
04 Jun 2025
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Emmy Postma
Cristian Tejedor-Garcia
150
0
0
02 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Aleksandr Kutsakov
Alexandr Maximenko
Georgii Gospodinov
Pavel Bogomolov
Fyodor Minkin
231
0
0
01 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
227
0
0
01 Jun 2025
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
Marianne de Heer Kloots
Hosein Mohebbi
Charlotte Pouw
Gaofei Shen
Willem H. Zuidema
Martijn Bentum
SSL
274
1
0
01 Jun 2025
ZIPA: A family of efficient models for multilingual phone recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
215
5
0
29 May 2025
Spoken question answering for visual queries
Nimrod Shabtay
Zvi Kons
Avihu Dekel
Hagai Aronowitz
R. Hoory
Assaf Arbelle
249
1
0
29 May 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
International Workshop on Spoken Language Translation (IWSLT), 2025
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
200
0
0
29 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
308
1
0
28 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
209
0
0
28 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
150
4
0
27 May 2025
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
293
3
0
23 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
381
0
0
23 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
291
1
0
22 May 2025
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Hongfei Xue
Yufeng Tang
Jun Zhang
Xuelong Geng
Lei Xie
275
0
0
22 May 2025
LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors
Rao Ma
Tongzhou Chen
Kartik Audhkhasi
Bhuvana Ramabhadran
AuLLM
332
2
0
16 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
1.0K
3
0
07 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Chunjiang Ge
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
250
17
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
351
3
0
01 May 2025
Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
Hongfei Xue
Yufeng Tang
Hexin Liu
Jun Zhang
Xuelong Geng
Lei Xie
LRM
236
1
0
29 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLM
VLM
429
124
0
25 Apr 2025
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
Abdulhady Abas Abdullah
S. H. Karim
Sara Azad Ahmed
Kanar R. Tariq
Tarik Ahmed Rashid
961
3
0
23 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
220
0
0
23 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
398
12
0
14 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
952
8
0
12 Apr 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
Yun Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
216
4
0
31 Mar 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
315
4
0
30 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
312
8
0
26 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
378
5
0
15 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
Marcely Zanon Boito
424
3
0
13 Mar 2025
Previous
1
2
3
4
5
6
7
8
Next
Page 2 of 8