Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.03411
Cited By
v1
v2 (latest)
MLS: A Large-Scale Multilingual Dataset for Speech Research
Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"MLS: A Large-Scale Multilingual Dataset for Speech Research"
50 / 390 papers shown
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
IEEE Open Journal of Signal Processing (JOSP), 2024
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
330
36
0
18 Sep 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue
Wei Ren
Xuelong Geng
Kun Wei
Longhao Li
Qijie Shao
Linju Yang
Kai Diao
Lei Xie
AuLLM
217
12
0
17 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
311
10
0
16 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
1.1K
0
0
14 Sep 2024
Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui
Daxin Tan
Yifan Yang
Dingdong Wang
Huimeng Wang
Xiao Chen
Xie Chen
Xunying Liu
335
6
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Eric Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
452
3
0
13 Sep 2024
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
Michael Ong
Sean Robertson
Leo Peckham
Alba Jorquera Jimenez de Aberasturi
Paula Arkhangorodsky
Robin Huo
Aman Sakhardande
Mark Hallap
Naomi Nagy
Ewan Dunbar
CVBM
727
0
0
12 Sep 2024
A Large Dataset of Spontaneous Speech with the Accent Spoken in São Paulo for Automatic Speech Recognition Evaluation
Brazilian Conference on Intelligent Systems (BRACIS), 2024
Rodrigo Lima
S. Leal
Arnaldo Candido Junior
S. Aluísio
200
3
0
10 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Spoken Language Technology Workshop (SLT), 2024
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
392
7
0
09 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
233
4
0
04 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
424
6
0
03 Sep 2024
A multilingual training strategy for low resource Text to Speech
Asma Amalas
Mounir Ghogho
Mohamed Chetouani
Rachid Oulad Haj Thami
289
3
0
02 Sep 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
316
3
0
31 Aug 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
452
186
0
29 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
383
4
0
23 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
252
10
0
20 Aug 2024
SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
Yuxiong Xu
Jiafeng Zhong
Sengui Zheng
Zefeng Liu
Bin Li
217
5
0
19 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
267
156
0
16 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Interspeech (Interspeech), 2024
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
232
14
0
12 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Interspeech (Interspeech), 2024
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
332
18
0
07 Aug 2024
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
212
3
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
374
28
0
21 Jul 2024
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
Mohamed Allam
279
0
0
18 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
272
205
0
07 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
403
78
0
05 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
265
10
0
03 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
415
51
0
30 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
237
11
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
264
27
0
25 Jun 2024
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Hyun Myung Kim
Kangwook Jang
Hoirin Kim
193
17
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
258
5
0
22 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2024
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
353
9
0
18 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
606
41
0
17 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Interspeech (Interspeech), 2024
Nameer Hirschkind
Xiao Yu
Xiao Yu
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
184
1
0
14 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Neural Information Processing Systems (NeurIPS), 2024
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
229
39
0
14 Jun 2024
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Interspeech (Interspeech), 2024
Haoyu Wang
Guoqiang Hu
Guodong Lin
Wei-Qiang Zhang
Jian Li
308
12
0
14 Jun 2024
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
J. Kolehmainen
Aditya Gourav
Prashanth Gurunath Shivakumar
Yile Gu
Ankur Gandhe
Ariya Rastrow
Grant P. Strimel
I. Bulyko
297
5
0
13 Jun 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
Jinchuan Tian
Yifan Peng
William Chen
Kwanghee Choi
Karen Livescu
Shinji Watanabe
203
12
0
13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
183
5
0
13 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
294
11
0
10 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
545
70
0
10 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Interspeech (Interspeech), 2024
Avihu Dekel
Raul Fernandez
255
3
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Interspeech (Interspeech), 2024
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
221
39
0
07 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
442
45
0
04 Jun 2024
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
Shengpeng Ji
Jia-li Zuo
Wen Wang
Jialong Zuo
Minghui Fang
...
Ziyue Jiang
Hai Huang
Xize Cheng
Siqi Zheng
Zhou Zhao
566
8
0
03 Jun 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
460
61
0
02 Jun 2024
Deep Learning for Assessment of Oral Reading Fluency
Mithilesh Vaidya
Binaya Kumar Sahoo
Preeti Rao
167
0
0
29 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2024
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
228
18
0
05 May 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
326
12
0
30 Apr 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
196
2
0
24 Apr 2024
Previous
1
2
3
4
5
6
7
8
Next
Page 4 of 8