v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020

7 December 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available DataAutomatic Speech Recognition & Understanding (ASRU), 2023

...

349

25 Sep 2023

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

393

22 Sep 2023

Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

195

21 Sep 2023

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Krishna C. Puvvada

Nithin Rao Koluguri

Kunal Dhawan

Jagadeesh Balam

Boris Ginsburg

143

19 Sep 2023

Investigating End-to-End ASR Architectures for Long Form Audio TranscriptionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jagadeesh Balam

Boris Ginsburg

AuLLM

241

18 Sep 2023

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and contextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Wei Kang

Xiaoyu Yang

Zengwei Yao

Fangjun Kuang

Yifan Yang

Liyong Guo

Long Lin

Daniel Povey

252

112

15 Sep 2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

365

14 Sep 2023

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023

...

262

11 Sep 2023

PromptTTS 2: Describing and Generating Voices with Text Prompt

Xu Tan

...

Xiang-Yang Li

Jiang Bian

274

05 Sep 2023

RepCodec: A Speech Representation Codec for Speech TokenizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhichao Huang

Chutong Meng

Tom Ko

217

31 Aug 2023

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Xipeng Qiu

371

112

31 Aug 2023

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary DataInterspeech (Interspeech), 2023

223

31 Aug 2023

Sparks of Large Audio Models: A Survey and Outlook

...

Björn W. Schuller

683

24 Aug 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023

Minsu Kim

Jeong Hun Yeo

J. Choi

Y. Ro

214

18 Aug 2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech ResynthesisInterspeech (Interspeech), 2023

...

Yossi Adi

232

109

10 Aug 2023

Federated Representation Learning for Automatic Speech Recognition

210

03 Aug 2023

An objective evaluation of Hearing Aids and DNN-based speech enhancement in complex acoustic scenesIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023

Xavier Serra

133

24 Jul 2023

Prompting Large Language Models with Speech Recognition AbilitiesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

Ozlem Kalinli

236

192

21 Jul 2023

MASR: Multi-label Aware Speech RepresentationAutomatic Speech Recognition & Understanding (ASRU), 2023

Anjali Raj

Shikhar Bharadwaj

Sriram Ganapathy

Min Ma

Shikhar Vashishth

SSL

180

20 Jul 2023

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

276

17 Jul 2023

Towards cross-language prosody transfer for dialogInterspeech (Interspeech), 2023

Jonathan Avila

Nigel G. Ward

233

09 Jul 2023

Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial UltrasoundIEEE Internet of Things Journal (IEEE IoT J.), 2023

Xinfeng Li

Xiaoyu Ji

198

28 Jun 2023

Confidence-based Ensembles of End-to-End Speech Recognition ModelsInterspeech (Interspeech), 2023

Boris Ginsburg

329

27 Jun 2023

AudioPaLM: A Large Language Model That Can Speak and Listen

Paul Kishan Rubenstein

Chulayuth Asawaroengchai

...

279

399

22 Jun 2023

Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

Kunal Dhawan

KDimating Rekesh

Boris Ginsburg

256

14 Jun 2023

Label Aware Speech Representation Learning For Language IdentificationInterspeech (Interspeech), 2023

Shikhar Vashishth

Shikhar Bharadwaj

Sriram Ganapathy

137

07 Jun 2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned PoolingInterspeech (Interspeech), 2023

Ramon Sanabria

Ondˇrej Klejch

Hao Tang

Sharon Goldwater

154

03 Jun 2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

371

01 Jun 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?Interspeech (Interspeech), 2023

Chao-Han Huck Yang

455

01 Jun 2023

Edit Distance based RL for RNNT decoding

DongSeon Hwang

Changwan Ryu

K. Sim

169

31 May 2023

BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Claytone Sikasote

Eunice Mukonde

Md Mahfuz Ibn Alam

Antonios Anastasopoulos

179

26 May 2023

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech SynthesisInterspeech (Interspeech), 2023

Seong-Hyun Park

Bohyung Kim

Tae-Hyun Oh

201

26 May 2023

Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023

...

Yossi Adi

401

538

22 May 2023

Textually Pretrained Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

429

22 May 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen LanguagesInterspeech (Interspeech), 2023

297

21 May 2023

Language-universal phonetic encoder for low-resource speech recognitionInterspeech (Interspeech), 2023

Yuxuan Wang

218

19 May 2023

Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech RecognitionInterspeech (Interspeech), 2023

Yuxuan Wang

168

19 May 2023

ML-SUPERB: Multilingual Speech Universal PERformance BenchmarkInterspeech (Interspeech), 2023

Jiatong Shi

...

333

18 May 2023

Understanding and Bridging the Modality Gap for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Qingkai Fang

Yang Feng

260

15 May 2023

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

215

09 May 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023

...

Krishna Puvvada

Jagadeesh Balam

Boris Ginsburg

333

145

08 May 2023

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing SynthesizersInternational Conference on Learning Representations (ICLR), 2023

Xu Tan

Jiang Bian

302

333

18 Apr 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023

Hainan Xu

Boris Ginsburg

187

13 Apr 2023

Enhancing Unsupervised Speech Recognition with Diffusion GANsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Xianchao Wu

DiffM

190

23 Mar 2023

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech captureIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

234

17 Mar 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

...

410

348

02 Mar 2023

Improving Massively Multilingual ASR With Auxiliary CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiatong Shi

266

24 Feb 2023

Catch You and I Can: Revealing Source Voiceprint Against Voice ConversionUSENIX Security Symposium (USENIX Security), 2023

Jiangyi Deng

Yanjiao Chen

Yinan Zhong

Qianhao Miao

Xueluan Gong

Wenyuan Xu Zhejiang University

250

24 Feb 2023

Speaker and Language Change Detection using Wav2vec2 and Whisper

Tijn Berns

Nik Vaessen

David A. van Leeuwen

170

18 Feb 2023

ASR Bundestag: A Large-Scale political debate dataset in GermanIntelligent Systems with Applications (ISA), 2023

Johannes Wirth

René Peinl

211

12 Feb 2023