Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.03411
Cited By
v1
v2 (latest)
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"MLS: A Large-Scale Multilingual Dataset for Speech Research"
50 / 390 papers shown
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
IEEE Open Journal of Signal Processing (JOSP), 2024
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
278
31
0
18 Sep 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue
Wei Ren
Xuelong Geng
Kun Wei
Longhao Li
Qijie Shao
Linju Yang
Kai Diao
Lei Xie
AuLLM
196
12
0
17 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
283
10
0
16 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
1.1K
0
0
14 Sep 2024
Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui
Daxin Tan
Yifan Yang
Dingdong Wang
Huimeng Wang
Xiao Chen
Xie Chen
Xunying Liu
267
5
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Eric Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
407
3
0
13 Sep 2024
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
Michael Ong
Sean Robertson
Leo Peckham
Alba Jorquera Jimenez de Aberasturi
Paula Arkhangorodsky
Robin Huo
Aman Sakhardande
Mark Hallap
Naomi Nagy
Ewan Dunbar
CVBM
616
0
0
12 Sep 2024
A Large Dataset of Spontaneous Speech with the Accent Spoken in São Paulo for Automatic Speech Recognition Evaluation
Brazilian Conference on Intelligent Systems (BRACIS), 2024
Rodrigo Lima
S. Leal
Arnaldo Candido Junior
S. Aluísio
174
2
0
10 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Spoken Language Technology Workshop (SLT), 2024
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
341
6
0
09 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
213
4
0
04 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
365
6
0
03 Sep 2024
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
224
2
0
02 Sep 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
301
3
0
31 Aug 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
363
167
0
29 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
334
4
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
181
9
0
20 Aug 2024
SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
Yuxiong Xu
Jiafeng Zhong
Sengui Zheng
Zefeng Liu
Bin Li
190
5
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
214
136
0
16 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Interspeech (Interspeech), 2024
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
214
10
0
12 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Interspeech (Interspeech), 2024
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
260
16
0
07 Aug 2024
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
194
2
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
347
24
0
21 Jul 2024
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
Mohamed Allam
228
0
0
18 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
236
182
0
07 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
385
73
0
05 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
249
9
0
03 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
339
48
0
30 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
203
11
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
227
27
0
25 Jun 2024
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Hyun Myung Kim
Kangwook Jang
Hoirin Kim
166
16
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
236
5
0
22 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
320
9
0
18 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
532
35
0
17 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Interspeech (Interspeech), 2024
Nameer Hirschkind
Xiao Yu
Xiao Yu
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
156
1
0
14 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Neural Information Processing Systems (NeurIPS), 2024
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
193
32
0
14 Jun 2024
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Interspeech (Interspeech), 2024
Haoyu Wang
Guoqiang Hu
Guodong Lin
Wei-Qiang Zhang
Jian Li
286
10
0
14 Jun 2024
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
J. Kolehmainen
Aditya Gourav
Prashanth Gurunath Shivakumar
Yile Gu
Ankur Gandhe
Ariya Rastrow
Grant P. Strimel
I. Bulyko
274
5
0
13 Jun 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
Jinchuan Tian
Yifan Peng
William Chen
Kwanghee Choi
Karen Livescu
Shinji Watanabe
195
12
0
13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
167
5
0
13 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
285
11
0
10 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
454
58
0
10 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Interspeech (Interspeech), 2024
Avihu Dekel
Raul Fernandez
162
3
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Interspeech (Interspeech), 2024
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
211
34
0
07 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
396
42
0
04 Jun 2024
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
Shengpeng Ji
Jia-li Zuo
Wen Wang
Jialong Zuo
Minghui Fang
...
Ziyue Jiang
Hai Huang
Xize Cheng
Siqi Zheng
Zhou Zhao
481
8
0
03 Jun 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
363
50
0
02 Jun 2024
Deep Learning for Assessment of Oral Reading Fluency
Mithilesh Vaidya
Binaya Kumar Sahoo
Preeti Rao
160
0
0
29 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2024
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
204
15
0
05 May 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
306
11
0
30 Apr 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
190
2
0
24 Apr 2024
Previous
1
2
3
4
5
6
7
8
Next
Page 4 of 8