v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020

7 December 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown

Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization

Tal Shuster

Eliya Nachmani

158

01 Dec 2025

Swivuriso: The South African Next Voices Multilingual Speech Dataset

...

100

01 Dec 2025

ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation

146

01 Dec 2025

RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech

...

145

26 Nov 2025

Continual Audio Deepfake Detection via Universal Adversarial PerturbationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2025

305

25 Nov 2025

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets

107

17 Nov 2025

On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Jonatas Grosman

Cassio Almeida

Guilherme Gonçalves Schardong

Helio Lopes

16 Nov 2025

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

...

711

16 Nov 2025

UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens

277

30 Oct 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

...

445

26 Oct 2025

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

190

23 Oct 2025

AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch

Weichuang Shao

I. Liao

Tomas Henrique Bode Maul

T. Chandesa

146

22 Oct 2025

MLMA: Towards Multilingual ASR With Mamba-based Architectures

312

21 Oct 2025

U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation

158

19 Oct 2025

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

...

192

19 Oct 2025

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

...

461

15 Oct 2025

Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction

217

13 Oct 2025

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Mohammad Javad Ranjbar Kalahroodi

Heshaam Faili

A. Shakery

214

12 Oct 2025

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms

Atul Shree

Harshith Jupuru

124

10 Oct 2025

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

120

10 Oct 2025

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

355

08 Oct 2025

Latent Speech-Text Transformer

...

168

07 Oct 2025

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

109

07 Oct 2025

Drax: Speech Recognition with Discrete Flow Matching

186

05 Oct 2025

Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?

Oriol Pareras

Gerard I. Gállego

Federico Costa

Cristina España-Bonet

Javier Hernando

LRM

204

03 Oct 2025

Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation

Cristina España-Bonet

LRM

126

03 Oct 2025

EuroSpeech: A Multilingual Speech Corpus

171

01 Oct 2025

On Deepfake Voice Detection -- It's All in the Presentation

127

30 Sep 2025

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

217

26 Sep 2025

Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization

Shehzeen Samarah Hussain

140

26 Sep 2025

Cross-Attention is Half Explanation in Speech-to-Text Models

222

22 Sep 2025

MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances

Laureano Moro-Velazquez

Jesus Villalba

Najim Dehak

140

21 Sep 2025

Bridging the gap between training and inference in LM-based TTS models

238

21 Sep 2025

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation

Luca Della Libera

Cem Subakan

Mirco Ravanelli

146

19 Sep 2025

SpeechOp: Inference-Time Task Composition for Generative Speech Processing

280

17 Sep 2025

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

...

168

17 Sep 2025

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

195

17 Sep 2025

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Karan Dua

Puneet Mittal

Ranjeet Gupta

Hitesh Laxmichand Patel

DiffM

354

15 Sep 2025

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs

A. K. M. Mahbubur Rahman

304

14 Sep 2025

Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates

219

11 Sep 2025

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

120

11 Sep 2025

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Zihan Pan

Sailor Hardik Bhupendra

Jinyang Wu

MoE

211

11 Sep 2025

Layer-wise Analysis for Quality of Multilingual Synthesized Speech

191

05 Sep 2025

LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

Gaspard Michel

Elena V. Epure

Christophe Cerisara

142

04 Sep 2025

Multi-level SSL Feature Gating for Audio Deepfake Detection

Pierre-François Marteau

David Guennec

148

03 Sep 2025

SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation

272

01 Sep 2025

Entropy-based Coarse and Compressed Semantic Speech Representation Learning

146

30 Aug 2025

CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese

...

139

27 Aug 2025

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

141

21 Aug 2025

Beyond Transcription: Mechanistic Interpretability in ASR

127

21 Aug 2025