ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03411
  4. Cited By
MLS: A Large-Scale Multilingual Dataset for Speech Research
v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 355 papers shown
Title
Cross-Attention is Half Explanation in Speech-to-Text Models
Cross-Attention is Half Explanation in Speech-to-Text Models
Sara Papi
Dennis Fucci
Marco Gaido
Matteo Negri
L. Bentivogli
LRM
0
0
0
22 Sep 2025
Bridging the gap between training and inference in LM-based TTS models
Bridging the gap between training and inference in LM-based TTS models
Ruonan Zhang
Lingzhou Mu
Xixin Wu
Kai Zhang
0
0
0
21 Sep 2025
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
Junhyeok Lee
Helin Wang
Yaohan Guan
Thomas Thebaud
Laureano Moro-Velazquez
Jesus Villalba
Najim Dehak
0
0
0
21 Sep 2025
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
Luca Della Libera
Cem Subakan
Mirco Ravanelli
0
0
0
19 Sep 2025
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Monica Sekoyan
Nithin Rao Koluguri
Nune Tadevosyan
Piotr .Zelasko
Travis M. Bartley
Nick Karpov
Jagadeesh Balam
Boris Ginsburg
VLM
0
0
0
17 Sep 2025
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
Justin Lovelace
Rithesh Kumar
Jiaqi Su
Ke Chen
Kilian Q. Weinberger
Zeyu Jin
DiffMVLM
0
0
0
17 Sep 2025
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
Brian Yan
Injy Hamed
Shuichiro Shimizu
Vasista Lodagala
William Chen
...
Samuele Cornell
Eunjung Yeo
Kwanghee Choi
Carlos Carvalho
Karen Rosero
4
0
0
17 Sep 2025
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Karan Dua
Puneet Mittal
Ranjeet Gupta
Hitesh Laxmichand Patel
DiffM
0
0
0
15 Sep 2025
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
Yuhao Zhang
Yuhao Du
Zhanchen Dai
Xiangnan Ma
Kaiqi Kou
Benyou Wang
Haizhou Li
0
1
0
11 Sep 2025
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Zihan Pan
Sailor Hardik Bhupendra
Jinyang Wu
MoE
0
0
0
11 Sep 2025
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Harry Julian
Rachel Beeson
Lohith Konathala
Johanna Ulin
Jiameng Gao
0
0
0
11 Sep 2025
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Erica Cooper
T. Okamoto
Yamato Ohtani
Tomoki Toda
Hisashi Kawai
0
0
0
05 Sep 2025
LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis
LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis
Gaspard Michel
Elena V. Epure
Christophe Cerisara
0
0
0
04 Sep 2025
Multi-level SSL Feature Gating for Audio Deepfake Detection
Multi-level SSL Feature Gating for Audio Deepfake Detection
Hoan My Tran
Damien Lolive
Aghilas Sini
Arnaud Delhay
Pierre-François Marteau
David Guennec
0
0
0
03 Sep 2025
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
Chenyang Le
Bing Han
Jinshun Li
Songyong Chen
Y. Qian
MoE
36
0
0
01 Sep 2025
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Jialong Zuo
Guangyan Zhang
Minghui Fang
Shengpeng Ji
Xiaoqi Jiao
Jingyu Li
Yiwen Guo
Zhou Zhao
16
0
0
30 Aug 2025
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
Carlos Carvalho
Francisco Teixeira
Catarina Botelho
Anna Pompili
Rubén Solera-Ureña
...
T. Rolland
John Mendonça
Diogo Pereira
Isabel Trancoso
A. Abad
20
0
0
27 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR
Beyond Transcription: Mechanistic Interpretability in ASR
Neta Glazer
Yael Segal-Feldman
Hilit Segev
Aviv Shamsian
Asaf Buchnick
Gill Hetz
Ethan Fetaya
Joseph Keshet
Aviv Navon
16
0
0
21 Aug 2025
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
Yirong Sun
Yizhong Geng
Peidong Wei
Yanjun Chen
Jinghan Yang
Rongfei Chen
Wei Zhang
Xiaoyu Shen
AuLLMALM
22
0
0
21 Aug 2025
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le
Yinfeng Xia
Huiyan Li
Manhong Wang
Yutao Sun
Xingyang Ma
Yanmin Qian
20
0
0
15 Aug 2025
$\text{M}^3\text{PDB}$: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
M3PDB\text{M}^3\text{PDB}M3PDB: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
B. Zhu
Cheng Gong
Muyang Wu
Ruihao Jing
Fan Liu
Xiaolei Zhang
Chi Zhang
Xuelong Li
28
0
0
13 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
24
1
0
12 Aug 2025
Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models
Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models
Wenze Xu
Chun Wang
Jiazhen Yu
Sheng Chen
Liang Gao
Weihong Deng
OT
48
0
0
11 Aug 2025
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
Nameer Hirschkind
Joseph Liu
Xiao Yu
Xiao Yu
53
0
0
07 Aug 2025
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
Xi Xuan
Yang Xiao
Rohan Kumar Das
Tomi Kinnunen
65
1
0
06 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
90
3
0
06 Aug 2025
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
William Ravenscroft
George Close
Kit Bower-Morris
Jamie Stacey
Dmitry Sityaev
Kris Y. Hong
53
0
0
29 Jul 2025
Binaural Target Speaker Extraction using HRTFs
Binaural Target Speaker Extraction using HRTFs
Yoav Ellinson
Sharon Gannot
62
0
0
25 Jul 2025
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
Hongfei Xue
Kaixun Huang
Zhikai Zhou
Shen Huang
Shidong Shang
59
2
0
24 Jul 2025
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Miaomiao Gao
Xiaoxiao Xiang
Yiwen Guo
AILaw
67
1
0
23 Jul 2025
Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model
Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model
Philippe Gonzalez
Torsten Dau
Tobias May
53
1
0
12 Jul 2025
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
He Wang
Linhan Ma
Dake Guo
Xiong Wang
Lei Xie
Jin Xu
Junyang Lin
AuLLM
79
1
0
08 Jul 2025
USAD: Universal Speech and Audio Representation via Distillation
USAD: Universal Speech and Audio Representation via Distillation
Heng-Jui Chang
Saurabhchand Bhati
James R. Glass
Alexander H. Liu
91
1
0
23 Jun 2025
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning
Giuseppe Attanasio
Sonal Sannigrahi
Ben Peters
André F. T. Martins
AuLLM
75
0
0
20 Jun 2025
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Weight Factorization and Centralization for Continual Learning in Speech Recognition
Enes Yavuz Ugan
Ngoc-Quan Pham
Alexander Waibel
CLLMoMe
78
0
0
19 Jun 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
79
1
0
18 Jun 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
Yizhou Peng
Bin Wang
Yi-Wen Chao
Ziyang Ma
Haoyang Zhang
Hexin Liu
Xie Chen
Eng Siong Chng
ELM
106
1
0
16 Jun 2025
The mutual exclusivity bias of bilingual visually grounded speech models
The mutual exclusivity bias of bilingual visually grounded speech models
Dan Oneaţă
Leanne Nortje
Yevgen Matusevych
Herman Kamper
86
0
0
04 Jun 2025
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
Ariadna Sanchez
Simon King
61
0
0
04 Jun 2025
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Emmy Postma
Cristian Tejedor-Garcia
85
0
0
02 Jun 2025
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
Marianne de Heer Kloots
Hosein Mohebbi
Charlotte Pouw
Gaofei Shen
Willem H. Zuidema
Martijn Bentum
SSL
149
1
0
01 Jun 2025
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
GigaAM: Efficient Self-Supervised Learner for Speech Recognition
Aleksandr Kutsakov
Alexandr Maximenko
Georgii Gospodinov
Pavel Bogomolov
Fyodor Minkin
93
0
0
01 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
104
0
0
01 Jun 2025
Spoken question answering for visual queries
Spoken question answering for visual queries
Nimrod Shabtay
Zvi Kons
Avihu Dekel
Hagai Aronowitz
R. Hoory
Assaf Arbelle
122
1
0
29 May 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
91
0
0
29 May 2025
ZIPA: A family of efficient models for multilingual phone recognition
ZIPA: A family of efficient models for multilingual phone recognition
Jian Zhu
Farhan Samir
Eleanor Chodroff
David R. Mortensen
104
0
0
29 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
75
0
0
28 May 2025
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian
Sara Papi
Marco Gaido
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
109
0
0
28 May 2025
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use
Titouan Parcollet
Yuan Tseng
Shucong Zhang
Rogier van Dalen
84
1
0
27 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
205
0
0
23 May 2025
12345678
Next