ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03411
  4. Cited By
MLS: A Large-Scale Multilingual Dataset for Speech Research
v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

Interspeech (Interspeech), 2020
7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 384 papers shown
Title
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Máté Gedeon
Piroska Zsófia Barta
Péter Mihajlik
Tekla Etelka Gráczi
Anna Kohári
Katalin Mády
4
0
0
17 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLMMoEOSLMVLM
121
0
0
16 Nov 2025
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
Chengwei Liu
Haoyin Yan
Shaofei Xue
Xiaotao Liang
Yinghao Liu
Zheng Xue
Gang Song
Boyang Zhou
142
0
0
30 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
136
2
0
26 Oct 2025
UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
Haoyin Yan
Chengwei Liu
Shaofei Xue
Xiaotao Liang
Zheng Xue
36
1
0
23 Oct 2025
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
Weichuang Shao
I. Liao
Tomas Henrique Bode Maul
T. Chandesa
48
0
0
22 Oct 2025
MLMA: Towards Multilingual ASR With Mamba-based Architectures
MLMA: Towards Multilingual ASR With Mamba-based Architectures
Mohamed Nabih Ali
Daniele Falavigna
Alessio Brutti
Mamba
163
0
0
21 Oct 2025
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
Wenxi Chen
X. Wang
Ruiqi Yan
Yihao Chen
Zhikang Niu
...
Yuzhe Liang
Hanlin Wen
Shunshun Yin
Ming Tao
Xie Chen
80
1
0
19 Oct 2025
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
Xusheng Yang
Long Zhou
Wenfu Wang
Kai Hu
Shulin Feng
Chenxing Li
Meng Yu
Dong Yu
Y. Zou
52
1
0
19 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
152
2
0
15 Oct 2025
Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction
Gelina: Unified Speech and Gesture Synthesis via Interleaved Token Prediction
Teo Guichoux
Théodor Lemerle
Shivam Mehta
Jonas Beskow
G. Henter
Laure Soulier
Catherine Pelachaud
Nicolas Obin
40
0
0
13 Oct 2025
ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
Mohammad Javad Ranjbar Kalahroodi
Heshaam Faili
A. Shakery
48
0
0
12 Oct 2025
FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms
FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms
Atul Shree
Harshith Jupuru
51
0
0
10 Oct 2025
O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion
O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion
Huu Tuong Tu
Huan Vu
cuong tien nguyen
Dien Hy Ngo
Nguyen Thi Thu Trang
40
0
0
10 Oct 2025
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Vaibhav Srivastav
Steven Zheng
Eric Bezzam
Eustache Le Bihan
Nithin Rao Koluguri
Piotr .Zelasko
Somshubra Majumdar
Adel Moumen
Sanchit Gandhi
113
0
0
08 Oct 2025
Latent Speech-Text Transformer
Latent Speech-Text Transformer
Yen-Ju Lu
Yashesh Gaur
Wei Zhou
Benjamin Muller
Jesus Villalba
...
Luke Zettlemoyer
Gargi Ghosh
Mike Lewis
Srinivasan Iyer
Duc Le
VLM
56
0
0
07 Oct 2025
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Yongqi Leng
Yikun Lei
Xikai Liu
M. Zhong
Bojian Xiong
Y. Zhang
Yan Gao
Yi-Chen Wu
Yao Hu
Deyi Xiong
29
0
0
07 Oct 2025
Drax: Speech Recognition with Discrete Flow Matching
Drax: Speech Recognition with Discrete Flow Matching
Aviv Navon
Aviv Shamsian
Neta Glazer
Yael Segal-Feldman
Gill Hetz
Joseph Keshet
Ethan Fetaya
84
0
0
05 Oct 2025
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
Jacobo Romero-Díaz
Gerard I. Gállego
Oriol Pareras
Federico Costa
Javier Hernando
Cristina España-Bonet
LRM
56
0
0
03 Oct 2025
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
Oriol Pareras
Gerard I. Gállego
Federico Costa
Cristina España-Bonet
Javier Hernando
LRM
48
0
0
03 Oct 2025
EuroSpeech: A Multilingual Speech Corpus
EuroSpeech: A Multilingual Speech Corpus
Samuel Pfisterer
Florian Grötschla
Luca A. Lanzendörfer
Florian Yan
Roger Wattenhofer
80
0
0
01 Oct 2025
On Deepfake Voice Detection - It's All in the Presentation
On Deepfake Voice Detection - It's All in the Presentation
Héctor Delgado
Giorgio Ramondetti
Emanuele Dalmasso
Gennady Karvitsky
Daniele Colibro
Haydar Talib
29
0
0
30 Sep 2025
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Yuhan Song
Linhao Zhang
Chuhan Wu
Aiwei Liu
Wei Jia
Houfeng Wang
Xiao-bin Zhou
68
0
0
26 Sep 2025
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Roy Fejgin
Ryan Langman
Mikyas T. Desta
Leili Tavabi
Jason Chun Lok Li
48
0
0
26 Sep 2025
Cross-Attention is Half Explanation in Speech-to-Text Models
Cross-Attention is Half Explanation in Speech-to-Text Models
Sara Papi
Dennis Fucci
Marco Gaido
Matteo Negri
L. Bentivogli
LRM
96
0
0
22 Sep 2025
Bridging the gap between training and inference in LM-based TTS models
Bridging the gap between training and inference in LM-based TTS models
Ruonan Zhang
Lingzhou Mu
Xixin Wu
Kai Zhang
72
0
0
21 Sep 2025
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion With Increased Controllability via Multiple Guidances
Junhyeok Lee
Helin Wang
Yaohan Guan
Thomas Thebaud
Laureano Moro-Velazquez
Jesus Villalba
Najim Dehak
56
0
0
21 Sep 2025
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
Luca Della Libera
Cem Subakan
Mirco Ravanelli
76
0
0
19 Sep 2025
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
Justin Lovelace
Rithesh Kumar
Jiaqi Su
Ke Chen
Kilian Q. Weinberger
Zeyu Jin
DiffMVLM
190
0
0
17 Sep 2025
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset
Brian Yan
Injy Hamed
Shuichiro Shimizu
Vasista Lodagala
William Chen
...
Samuele Cornell
Eunjung Yeo
Kwanghee Choi
Carlos Carvalho
Karen Rosero
68
2
0
17 Sep 2025
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
Monica Sekoyan
Nithin Rao Koluguri
Nune Tadevosyan
Piotr .Zelasko
Travis M. Bartley
Nick Karpov
Jagadeesh Balam
Boris Ginsburg
VLM
100
1
0
17 Sep 2025
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Karan Dua
Puneet Mittal
Ranjeet Gupta
Hitesh Laxmichand Patel
DiffM
190
4
0
15 Sep 2025
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
FuseCodec: Semantic-Contextual Fusion and Supervision for Neural Codecs
Md Mubtasim Ahasan
Rafat Hasan Khan
Tasnim Mohiuddin
Vasu Sharma
Tariq Iqbal
M. A. Amin
Amin Ahsan Ali
M. Islam
A. K. M. Mahbubur Rahman
137
1
0
14 Sep 2025
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates
Harry Julian
Rachel Beeson
Lohith Konathala
Johanna Ulin
Jiameng Gao
94
0
0
11 Sep 2025
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
Yuhao Zhang
Yuhao Du
Zhanchen Dai
Xiangnan Ma
Kaiqi Kou
Benyou Wang
Haizhou Li
36
2
0
11 Sep 2025
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
MoLEx: Mixture of LoRA Experts in Speech Self-Supervised Models for Audio Deepfake Detection
Zihan Pan
Sailor Hardik Bhupendra
Jinyang Wu
MoE
84
1
0
11 Sep 2025
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Erica Cooper
T. Okamoto
Yamato Ohtani
Tomoki Toda
Hisashi Kawai
64
0
0
05 Sep 2025
LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis
LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis
Gaspard Michel
Elena V. Epure
Christophe Cerisara
64
0
0
04 Sep 2025
Multi-level SSL Feature Gating for Audio Deepfake Detection
Multi-level SSL Feature Gating for Audio Deepfake Detection
Hoan My Tran
Damien Lolive
Aghilas Sini
Arnaud Delhay
Pierre-François Marteau
David Guennec
72
0
0
03 Sep 2025
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
Chenyang Le
Bing Han
Jinshun Li
Songyong Chen
Y. Qian
MoE
120
0
0
01 Sep 2025
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
Jialong Zuo
Guangyan Zhang
Minghui Fang
Shengpeng Ji
Xiaoqi Jiao
Jingyu Li
Yiwen Guo
Zhou Zhao
60
0
0
30 Aug 2025
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
CAMÕES: A Comprehensive Automatic Speech Recognition Benchmark for European Portuguese
Carlos Carvalho
Francisco Teixeira
Catarina Botelho
Anna Pompili
Rubén Solera-Ureña
...
T. Rolland
John Mendonça
Diogo Pereira
Isabel Trancoso
A. Abad
52
0
0
27 Aug 2025
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model
Yirong Sun
Yizhong Geng
Peidong Wei
Yanjun Chen
Jinghan Yang
Rongfei Chen
Wei Zhang
Xiaoyu Shen
AuLLMALM
62
0
0
21 Aug 2025
Beyond Transcription: Mechanistic Interpretability in ASR
Beyond Transcription: Mechanistic Interpretability in ASR
Neta Glazer
Yael Segal-Feldman
Hilit Segev
Aviv Shamsian
Asaf Buchnick
Gill Hetz
Ethan Fetaya
Joseph Keshet
Aviv Navon
56
0
0
21 Aug 2025
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le
Yinfeng Xia
Huiyan Li
Manhong Wang
Yutao Sun
Xingyang Ma
Yanmin Qian
52
0
0
15 Aug 2025
$\text{M}^3\text{PDB}$: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
M3PDB\text{M}^3\text{PDB}M3PDB: A Multimodal, Multi-Label, Multilingual Prompt Database for Speech Generation
B. Zhu
Cheng Gong
Muyang Wu
Ruihao Jing
Fan Liu
Xiaolei Zhang
Chi Zhang
Xuelong Li
70
0
0
13 Aug 2025
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Flow-SLM: Joint Learning of Linguistic and Acoustic Information for Spoken Language Modeling
Ju-Chieh Chou
Jiawei Zhou
Karen Livescu
116
3
0
12 Aug 2025
Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models
Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models
Wenze Xu
Chun Wang
Jiazhen Yu
Sheng Chen
Liang Gao
Weihong Deng
OT
118
0
0
11 Aug 2025
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
Nameer Hirschkind
Joseph Liu
Xiao Yu
Xiao Yu
89
0
0
07 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
243
10
0
06 Aug 2025
12345678
Next