v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020

7 December 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 390 papers shown

Q2D2: A Geometry-Aware Audio Codec Leveraging Two-Dimensional Quantization

Tal Shuster

Eliya Nachmani

120

01 Dec 2025

Swivuriso: The South African Next Voices Multilingual Speech Dataset

...

01 Dec 2025

ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation

112

01 Dec 2025

RosettaSpeech: Zero-Shot Speech-to-Speech Translation from Monolingual Data

...

121

26 Nov 2025

Continual Audio Deepfake Detection via Universal Adversarial PerturbationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2025

280

25 Nov 2025

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets

17 Nov 2025

On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Jonatas Grosman

Cassio Almeida

Guilherme Gonçalves Schardong

Helio Lopes

16 Nov 2025

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

...

626

16 Nov 2025

UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens

253

30 Oct 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

...

404

26 Oct 2025

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

158

23 Oct 2025

AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch

Weichuang Shao

I. Liao

Tomas Henrique Bode Maul

T. Chandesa

124

22 Oct 2025

MLMA: Towards Multilingual ASR With Mamba-based Architectures

284

21 Oct 2025

U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation

136

19 Oct 2025

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization

...

166

19 Oct 2025

InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

...

433

15 Oct 2025

Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction

168

13 Oct 2025

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Mohammad Javad Ranjbar Kalahroodi

Heshaam Faili

A. Shakery

198

12 Oct 2025

FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms

Atul Shree

Harshith Jupuru

102

10 Oct 2025

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion

10 Oct 2025

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

252

08 Oct 2025

Latent Speech-Text Transformer

...

130

07 Oct 2025

DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision

07 Oct 2025

Drax: Speech Recognition with Discrete Flow Matching

134

05 Oct 2025

Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?

Oriol Pareras

Gerard I. Gállego

Federico Costa

Cristina España-Bonet

Javier Hernando

LRM

138

03 Oct 2025

Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation

Cristina España-Bonet

LRM

105

03 Oct 2025

EuroSpeech: A Multilingual Speech Corpus

145

01 Oct 2025

On Deepfake Voice Detection - It's All in the Presentation

108

30 Sep 2025

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

135

26 Sep 2025

Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization

Shehzeen Samarah Hussain

113

26 Sep 2025

Cross-Attention is Half Explanation in Speech-to-Text Models

161

22 Sep 2025

MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances

Laureano Moro-Velazquez

Jesus Villalba

Najim Dehak

111

21 Sep 2025

Bridging the gap between training and inference in LM-based TTS models

150

21 Sep 2025

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation

Luca Della Libera

Cem Subakan

Mirco Ravanelli

119

19 Sep 2025

SpeechOp: Inference-Time Task Composition for Generative Speech Processing

258

17 Sep 2025

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

...

161

17 Sep 2025

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

178

17 Sep 2025

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Karan Dua

Puneet Mittal

Ranjeet Gupta

Hitesh Laxmichand Patel

DiffM

306

15 Sep 2025

FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs

A. K. M. Mahbubur Rahman

260

14 Sep 2025

Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates

189

11 Sep 2025

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

11 Sep 2025

MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection

Zihan Pan

Sailor Hardik Bhupendra

Jinyang Wu

MoE

179

11 Sep 2025

Layer-wise Analysis for Quality of Multilingual Synthesized Speech

134

05 Sep 2025

LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

Gaspard Michel

Elena V. Epure

Christophe Cerisara

132

04 Sep 2025

Multi-level SSL Feature Gating for Audio Deepfake Detection

Pierre-François Marteau

David Guennec

136

03 Sep 2025

SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation

207

01 Sep 2025

Entropy-based Coarse and Compressed Semantic Speech Representation Learning

109

30 Aug 2025

CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese

...

126

27 Aug 2025

LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model

123

21 Aug 2025

Beyond Transcription: Mechanistic Interpretability in ASR

104

21 Aug 2025