Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.03411
Cited By
v1
v2 (latest)
MLS: A Large-Scale Multilingual Dataset for Speech Research
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MLS: A Large-Scale Multilingual Dataset for Speech Research"
50 / 321 papers shown
Title
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Giuseppe Attanasio
Sonal Sannigrahi
Ben Peters
André F. T. Martins
AuLLM
18
0
0
20 Jun 2025
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan
Ngoc-Quan Pham
Alexander Waibel
CLL
MoMe
6
0
0
19 Jun 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
12
0
0
18 Jun 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
Yizhou Peng
Bin Wang
Yi-Wen Chao
Ziyang Ma
Haoyang Zhang
Hexin Liu
Xie Chen
Eng Siong Chng
ELM
10
0
0
16 Jun 2025
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
Ariadna Sanchez
Simon King
11
0
0
04 Jun 2025
The mutual exclusivity bias of bilingual visually grounded speech models
Dan Oneaţă
Leanne Nortje
Yevgen Matusevych
Herman Kamper
39
0
0
04 Jun 2025
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Emmy Postma
Cristian Tejedor-Garcia
17
0
0
02 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
29
0
0
01 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Aleksandr Kutsakov
Alexandr Maximenko
Georgii Gospodinov
Pavel Bogomolov
Fyodor Minkin
23
0
0
01 Jun 2025
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
Marianne de Heer Kloots
Hosein Mohebbi
Charlotte Pouw
Gaofei Shen
Willem H. Zuidema
Martijn Bentum
SSL
47
0
0
01 Jun 2025
Spoken question answering for visual queries
Nimrod Shabtay
Zvi Kons
Avihu Dekel
Hagai Aronowitz
R. Hoory
Assaf Arbelle
52
0
0
29 May 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
24
0
0
29 May 2025
ZIPA: A family of efficient models for multilingual phone recognition
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
33
0
0
29 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
15
0
0
28 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
23
0
0
28 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
15
1
0
27 May 2025
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
77
1
0
23 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
157
0
0
23 May 2025
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
Hongfei Xue
Yufeng Tang
Jun Zhang
Xuelong Geng
Lei Xie
47
0
0
22 May 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
Tianduo Wang
Lu Xu
Wei Lu
Shanbo Cheng
38
0
0
22 May 2025
LegoSLM: Connecting LLM with Speech Encoder using CTC Posteriors
Rao Ma
Tongzhou Chen
Kartik Audhkhasi
Bhuvana Ramabhadran
AuLLM
88
0
0
16 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
421
1
0
07 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
75
2
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
Hongfei Xue
Yufeng Tang
Hexin Liu
Jun Zhang
Xuelong Geng
Lei Xie
LRM
100
1
0
29 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLM
VLM
181
13
0
25 Apr 2025
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning
Abdulhady Abas Abdullah
S. H. Karim
Sara Azad Ahmed
Kanar R. Tariq
Tarik Ahmed Rashid
431
0
0
23 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
115
0
0
23 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
225
2
0
14 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
449
2
0
12 Apr 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
Yun Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
79
1
0
31 Mar 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
73
3
0
30 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
137
0
0
26 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
144
1
0
15 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
107
2
0
13 Mar 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David Harwath
Eunsol Choi
CLIP
VLM
122
4
0
06 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xiang Wang
Mingqi Jiang
Zejun Ma
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Yu Guo
Wei Xue
127
22
0
03 Mar 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
73
1
0
27 Feb 2025
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
238
2
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
198
4
0
26 Feb 2025
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Yike Guo
Wei Xue
MLLM
AuLLM
CLIP
VLM
93
1
0
23 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
133
0
0
21 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLM
SyDa
VLM
170
1
0
18 Feb 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
Tiejun Zhao
M. Yang
Mamba
AuLLM
129
0
0
16 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan
Hung-yi Lee
129
0
0
08 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
138
5
0
07 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yansen Wang
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
136
5
0
28 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
Marco Giordano
Claudia Rinaldi
95
0
0
11 Jan 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
96
2
0
08 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
Ran Xu
Han Wang
Caiming Xiong
Siyang Song
DiffM
145
0
0
03 Jan 2025
1
2
3
4
5
6
7
Next