ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03411
  4. Cited By
MLS: A Large-Scale Multilingual Dataset for Speech Research
v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown
Reproducing Whisper-Style Training Using an Open-Source Toolkit and
  Publicly Available Data
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataAutomatic Speech Recognition & Understanding (ASRU), 2023
Yifan Peng
Jinchuan Tian
Brian Yan
Dan Berrebbi
Xuankai Chang
...
Yui Sudo
Muhammad Shakeel
Jee-weon Jung
Soumi Maiti
Shinji Watanabe
VLM
349
63
0
25 Sep 2023
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiamin Xie
Ke Li
Jinxi Guo
Andros Tjandra
Shangguan Yuan
Leda Sari
Chunyang Wu
Junteng Jia
Jay Mahadeokar
Ozlem Kalinli
393
3
0
22 Sep 2023
Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation
  Using Simulated Data and a Teacher Model
Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jozef Coldenhoff
Andrew Harper
Paul Kendrick
Tijana Stojkovic
Milos Cernak
195
4
0
21 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
143
26
0
19 Sep 2023
Investigating End-to-End ASR Architectures for Long Form Audio
  Transcription
Investigating End-to-End ASR Architectures for Long Form Audio TranscriptionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nithin Rao Koluguri
Samuel Kriman
Georgy Zelenfroind
Somshubra Majumdar
Dima Rekesh
Vahid Noroozi
Jagadeesh Balam
Boris Ginsburg
AuLLM
241
15
0
18 Sep 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and
  context
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and contextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Wei Kang
Xiaoyu Yang
Zengwei Yao
Fangjun Kuang
Yifan Yang
Liyong Guo
Long Lin
Daniel Povey
252
112
0
15 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
365
86
0
14 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for
  Self-supervised Representations of French Speech
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
Fabien Ringeval
D. Schwab
Laurent Besacier
262
27
0
11 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLMDiffM
274
71
0
05 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech TokenizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhichao Huang
Chutong Meng
Tom Ko
217
41
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
371
112
0
31 Aug 2023
Improving Small Footprint Few-shot Keyword Spotting with Supervision on
  Auxiliary Data
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary DataInterspeech (Interspeech), 2023
Seunghan Yang
Byeonggeun Kim
Kyuhong Shim
Simyoung Chang
223
2
0
31 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Xiaoshi Zhong
Björn W. Schuller
LM&MAAuLLM
683
53
0
24 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
214
28
0
18 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech ResynthesisInterspeech (Interspeech), 2023
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
232
109
0
10 Aug 2023
Federated Representation Learning for Automatic Speech Recognition
Federated Representation Learning for Automatic Speech Recognition
Guruprasad V Ramesh
Gopinath Chennupati
Milind Rao
Anit Kumar Sahu
Ariya Rastrow
J. Droppo
210
0
0
03 Aug 2023
An objective evaluation of Hearing Aids and DNN-based speech enhancement
  in complex acoustic scenes
An objective evaluation of Hearing Aids and DNN-based speech enhancement in complex acoustic scenesIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Enric Gusó
Joanna Luberadzka
Martí Baig
Umut Sayin Saraç
Xavier Serra
133
5
0
24 Jul 2023
Prompting Large Language Models with Speech Recognition Abilities
Prompting Large Language Models with Speech Recognition AbilitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Junteng Jia
Yuan Shangguan
...
Wenhan Xiong
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
M. Seltzer
AuLLM
236
192
0
21 Jul 2023
MASR: Multi-label Aware Speech Representation
MASR: Multi-label Aware Speech RepresentationAutomatic Speech Recognition & Understanding (ASRU), 2023
Anjali Raj
Shikhar Bharadwaj
Sriram Ganapathy
Min Ma
Shikhar Vashishth
SSL
180
0
0
20 Jul 2023
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and
  Development
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Yanir Marmor
Kinneret Misgav
Y. Lifshitz
VLM
276
5
0
17 Jul 2023
Towards cross-language prosody transfer for dialog
Towards cross-language prosody transfer for dialogInterspeech (Interspeech), 2023
Jonathan Avila
Nigel G. Ward
233
7
0
09 Jul 2023
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via
  Adversarial Ultrasound
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal (IEEE IoT J.), 2023
Xinfeng Li
Junning Ze
Chen Yan
Yushi Cheng
Xiaoyu Ji
Wei Dong
AAML
198
14
0
28 Jun 2023
Confidence-based Ensembles of End-to-End Speech Recognition Models
Confidence-based Ensembles of End-to-End Speech Recognition ModelsInterspeech (Interspeech), 2023
Igor Gitman
Vitaly Lavrukhin
A. Laptev
Boris Ginsburg
UQCV
329
9
0
27 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MAAuLLMVLM
279
399
0
22 Jun 2023
Unified model for code-switching speech recognition and language
  identification based on a concatenated tokenizer
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
256
16
0
14 Jun 2023
Label Aware Speech Representation Learning For Language Identification
Label Aware Speech Representation Learning For Language IdentificationInterspeech (Interspeech), 2023
Shikhar Vashishth
Shikhar Bharadwaj
Sriram Ganapathy
Ankur Bapna
Min Ma
Wei Han
Vera Axelrod
Partha P. Talukdar
SSL
137
4
0
07 Jun 2023
Acoustic Word Embeddings for Untranscribed Target Languages with
  Continued Pretraining and Learned Pooling
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned PoolingInterspeech (Interspeech), 2023
Ramon Sanabria
Ondˇrej Klejch
Hao Tang
Sharon Goldwater
154
4
0
03 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
371
2
0
01 Jun 2023
How to Estimate Model Transferability of Pre-Trained Speech Models?
How to Estimate Model Transferability of Pre-Trained Speech Models?Interspeech (Interspeech), 2023
Zih-Ching Chen
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Shoufeng Chang
Rohit Prabhavalkar
Hung-yi Lee
Tara N. Sainath
455
11
0
01 Jun 2023
Edit Distance based RL for RNNT decoding
Edit Distance based RL for RNNT decoding
DongSeon Hwang
Changwan Ryu
K. Sim
169
0
0
31 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
179
9
0
26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in
  End-to-End Zero-Shot Speech Synthesis
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech SynthesisInterspeech (Interspeech), 2023
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
201
1
0
26 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
401
538
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLMSyDa
429
94
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
  Pre-Training for Adaptation to Unseen Languages
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen LanguagesInterspeech (Interspeech), 2023
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
297
25
0
21 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognitionInterspeech (Interspeech), 2023
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
218
3
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech RecognitionInterspeech (Interspeech), 2023
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
168
8
0
19 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB: Multilingual Speech Universal PERformance BenchmarkInterspeech (Interspeech), 2023
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
333
88
0
18 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Understanding and Bridging the Modality Gap for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
260
29
0
15 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
215
4
0
09 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
333
145
0
08 May 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing SynthesizersInternational Conference on Learning Representations (ICLR), 2023
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
302
333
0
18 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
187
45
0
13 Apr 2023
Enhancing Unsupervised Speech Recognition with Diffusion GANs
Enhancing Unsupervised Speech Recognition with Diffusion GANsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xianchao Wu
DiffM
190
2
0
23 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance
  body-conducted speech capture
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech captureIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
234
10
0
17 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
410
348
0
02 Mar 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
266
51
0
24 Feb 2023
Catch You and I Can: Revealing Source Voiceprint Against Voice
  Conversion
Catch You and I Can: Revealing Source Voiceprint Against Voice ConversionUSENIX Security Symposium (USENIX Security), 2023
Jiangyi Deng
Yanjiao Chen
Yinan Zhong
Qianhao Miao
Xueluan Gong
Wenyuan Xu Zhejiang University
250
13
0
24 Feb 2023
Speaker and Language Change Detection using Wav2vec2 and Whisper
Speaker and Language Change Detection using Wav2vec2 and Whisper
Tijn Berns
Nik Vaessen
David A. van Leeuwen
170
6
0
18 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
ASR Bundestag: A Large-Scale political debate dataset in GermanIntelligent Systems with Applications (ISA), 2023
Johannes Wirth
René Peinl
211
2
0
12 Feb 2023
Previous
12345678
Next
Page 6 of 8