v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020

7 December 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Zhen Ye

Xu Tan

...

Wei Xue

285

23 Apr 2024

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Pavel Denisov

Ngoc Thang Vu

207

16 Apr 2024

MAD Speech: Measures of Acoustic Diversity of Speech

338

16 Apr 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Detai Xin

Xu Tan

Kai Shen

Zeqian Ju

Dongchao Yang

...

Hiroshi Saruwatari

288

04 Apr 2024

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-SpeechInternational Conference on Learning Representations (ICLR), 2024

232

03 Apr 2024

Croissant: A Metadata Format for ML-Ready Datasets

...

331

28 Mar 2024

Phonetic Segmentation of the UCLA Phonetics Lab Archive

263

28 Mar 2024

Encoding of lexical tone in self-supervised models of spoken language

329

25 Mar 2024

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech RepresentationsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

Amit Meghanani

Thomas Hain

SSL

149

13 Mar 2024

Speech Robust Bench: A Robustness Benchmark For Speech RecognitionInternational Conference on Learning Representations (ICLR), 2024

248

08 Mar 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

...

Andrew Rosenberg

Bhuvana Ramabhadran

Heiga Zen

Francoise Beaufays

Hadar Shemtov

303

29 Feb 2024

Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps

Giuseppe Attanasio

Beatrice Savoldi

Dennis Fucci

Dirk Hovy

227

28 Feb 2024

Direct Punjabi to English speech translation using discrete units

Prabhjot Kaur

L. A. M. Bush

Weisong Shi

208

25 Feb 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Shinji Watanabe

344

20 Feb 2024

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

471

19 Feb 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

...

550

209

19 Feb 2024

SpiRit-LM: Interleaved Spoken and Written Language Model

...

274

107

08 Feb 2024

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Hung-yi Lee

234

06 Feb 2024

Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Daniel Lyth

Simon King

311

02 Feb 2024

Exploring the limits of decoder-only models trained on public speech recognition corpora

199

31 Jan 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

...

Jiatong Shi

Shinji Watanabe

314

30 Jan 2024

Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording PrivilegeIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024

191

28 Jan 2024

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Chenpeng Du

Yiwei Guo

Hankun Wang

Yifan Yang

Zhikang Niu

Shuai Wang

Hui Zhang

Xie Chen

Kai Yu

VLM

394

25 Jan 2024

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

Xipeng Qiu

247

24 Jan 2024

Adversarial speech for voice privacy protection from Personalized Speech generationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

KongAik Lee

223

22 Jan 2024

Detecting Multimedia Generated by Large AI Models: A Survey

900

22 Jan 2024

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Jeong Hun Yeo

297

18 Jan 2024

Pheme: Efficient and Conversational Speech Generation

206

05 Jan 2024

Boosting Large Language Model for Speech Synthesis: An Empirical StudyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shujie Hu

Rui Wang

Furu Wei

275

30 Dec 2023

Audiobox: Unified Audio Generation with Natural Language Prompts

...

347

139

25 Dec 2023

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Xin Wang

Longbiao Wang

271

22 Dec 2023

Generative linguistic representation for spoken language identification

Peng Shen

Xuguang Lu

Hisashi Kawai

150

18 Dec 2023

Amphion: An Open-Source Audio, Music and Speech Generation ToolkitSpoken Language Technology Workshop (SLT), 2023

Xueyao Zhang

Liumeng Xue

Yicheng Gu

Yuancheng Wang

Haorui He

...

Haizhou Li

270

15 Dec 2023

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models

Björn W. Schuller

361

11 Dec 2023

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Junteng Jia

Ozlem Kalinli

270

12 Nov 2023

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal DirectionsInterspeech (Interspeech), 2023

190

26 Oct 2023

The IMS Toucan System for the Blizzard Challenge 2023

178

26 Oct 2023

CL-MASR: A Continual Learning Benchmark for Multilingual ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Mirco Ravanelli

276

25 Oct 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

T. Park

He Huang

Coleman Hooper

Nithin Rao Koluguri

Kunal Dhawan

Ante Jukić

Jagadeesh Balam

Boris Ginsburg

185

18 Oct 2023

Multi-stage Large Language Model Correction for Speech Recognition

294

17 Oct 2023

Optimized Tokenization for Transcribed Error CorrectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Tomer Wullach

Shlomo E. Chazan

207

16 Oct 2023

Toward Joint Language Modeling for Speech Units and TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

237

12 Oct 2023

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

Matthew Baas

Herman Kamper

198

12 Oct 2023

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023

Kay Chen Tan

357

11 Oct 2023

Evaluating Self-Supervised Speech Representations for Indigenous American LanguagesInternational Conference on Language Resources and Evaluation (LREC), 2023

283

05 Oct 2023

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Kuan-Po Huang

Chih-Kai Yang

Yu-Kuan Fu

Ewan Dunbar

Hung-yi Lee

342

04 Oct 2023

Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech ModelAutomatic Speech Recognition & Understanding (ASRU), 2023

Hung-yi Lee

343

04 Oct 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

Dongchao Yang

Jinchuan Tian

Xuejiao Tan

Rongjie Huang

Songxiang Liu

...

Jiang Bian

Xixin Wu

Zhou Zhao

Shinji Watanabe

Helen M. Meng

CVBM AuLLM

515

186

01 Oct 2023

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised LearningAutomatic Speech Recognition & Understanding (ASRU), 2023

Jiatong Shi

Wangyou Zhang

265

26 Sep 2023

Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project

Khai-Nguyen Nguyen

216

26 Sep 2023