v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020

7 December 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown

Scaling Rich Style-Prompted Text-to-Speech Datasets

401

06 Mar 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

...

306

110

03 Mar 2025

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation

230

27 Feb 2025

M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance

...

630

26 Feb 2025

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

...

599

26 Feb 2025

Audio-FLAN: A Preliminary Release

...

298

23 Feb 2025

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis

368

21 Feb 2025

Adopting Whisper for Confidence EstimationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

254

20 Feb 2025

Soundwave: Less is More for Speech-Text Alignment in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

282

18 Feb 2025

DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities

303

16 Feb 2025

Gender Bias in Instruction-Guided Speech Synthesis ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Chun-Yi Kuan

Hung-yi Lee

472

08 Feb 2025

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

Shehzeen Samarah Hussain

390

07 Feb 2025

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

...

372

27 Jan 2025

A Survey on Spoken Italian Datasets and CorporaIEEE Access (IEEE Access), 2025

Marco Giordano

Claudia Rinaldi

267

11 Jan 2025

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

271

08 Jan 2025

Text2Data: Low-Resource Data Generation with Textual ControlAAAI Conference on Artificial Intelligence (AAAI), 2024

353

03 Jan 2025

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition

230

21 Dec 2024

LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific TransliterationAAAI Conference on Artificial Intelligence (AAAI), 2024

Sangmin Lee

Woo-Jin Chung Hong-Goo Kang

Hong-Goo Kang

474

19 Dec 2024

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

316

29 Nov 2024

ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation ModelsIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2024

105

14 Nov 2024

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Guan-Ting Lin

Prashanth Gurunath Shivakumar

356

04 Nov 2024

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

317

31 Oct 2024

Augmenting Polish Automatic Speech Recognition System With Synthetic Data

Łukasz Bondaruk

Jakub Kubiak

Mateusz Czyżnikiewicz

133

30 Oct 2024

ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams

Srija Anand

Praveen Srinivasa Varadhan

Mehak Singal

Mitesh M. Khapra

182

23 Oct 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

...

Zhihao Du

Shiliang Zhang

SyDa BDL AuLLM VLM

350

23 Oct 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-TuningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

422

23 Oct 2024

Moonshine: Speech Recognition for Live Transcription and Voice Commands

150

21 Oct 2024

Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses

190

20 Oct 2024

Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

353

20 Oct 2024

Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR

144

17 Oct 2024

Sound Check: Auditing Audio Datasets

William Agnew

Harry H. Jiang

Sauvik Das

365

17 Oct 2024

Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR

Christoph Minixhofer

Ondˇrej Klejch

Peter Bell

244

16 Oct 2024

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Xin Zhang

Xiang Lyu

Zhihao Du

Qian Chen

Dong Zhang

...

Yuxuan Wang

327

09 Oct 2024

Sylber: Syllabic Embedding Representation of Speech from Raw AudioInternational Conference on Learning Representations (ICLR), 2024

Cheol Jun Cho

Nicholas Lee

Akshat Gupta

Dhruv Agarwal

Ethan Chen

Alan W Black

Gopala K. Anumanchipalli

288

09 Oct 2024

A Two-Step Approach for Data-Efficient French Pronunciation LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

08 Oct 2024

Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations

Qilong Pan

Sameh Abdulah

M. Genton

Ying Sun

192

06 Oct 2024

Distilling an End-to-End Voice Assistant Without Instruction Training Data

Diyi Yang

327

03 Oct 2024

HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASRInternational Conference on Learning Representations (ICLR), 2024

975

03 Oct 2024

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

195

01 Oct 2024

Recent Advances in Speech Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

565

01 Oct 2024

SSR: Alignment-Aware Modality Connector for Speech Language ModelsInternational Workshop on Spoken Language Translation (IWSLT), 2024

426

30 Sep 2024

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

Jin Xu

236

28 Sep 2024

Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice ConversionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

184

25 Sep 2024

Speech Recognition Rescoring with Large Speech-Text Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Prashanth Gurunath Shivakumar

251

25 Sep 2024

Revisiting Acoustic Features for Robust ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Muhammad Ahmed Shah

Bhiksha Raj

AAML

179

24 Sep 2024

LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Hieu-Thi Luong

Haoyang Li

Lin Zhang

Kong Aik Lee

Eng Siong Chng

273

23 Sep 2024

Semi-supervised Learning For Robust Speech EvaluationSpoken Language Technology Workshop (SLT), 2024

Huayun Zhang

Jeremy H. M. Wong

Geyu Lin

Nancy F. Chen

180

23 Sep 2024

A Comprehensive Survey with Critical Analysis for Deepfake Speech DetectionComputer Science Review (CSR), 2024

Lam Pham

Phat Lam

Dat Tran

Hieu Tang

Tin Nguyen

Alexander Schindler

Canh Vu

Alexander Polonsky

Canh Vu

522

23 Sep 2024

Preference Alignment Improves Language Model-Based TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Jinchuan Tian

Chunlei Zhang

Jiatong Shi

Hao Zhang

Jianwei Yu

Shinji Watanabe

Dong Yu

238

19 Sep 2024

Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and InferenceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Edresson Casanova

Ryan Langman

Paarth Neekhara

Shehzeen Samarah Hussain

Subhankar Ghosh

172

18 Sep 2024