ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03411
  4. Cited By
MLS: A Large-Scale Multilingual Dataset for Speech Research
v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 321 papers shown
Title
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
85
5
0
12 Aug 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond
Beomseok Lee
Ioan Calapodescu
Marco Gaido
Matteo Negri
Laurent Besacier
AuLLM
64
6
0
07 Aug 2024
Towards scalable efficient on-device ASR with transfer learning
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
61
2
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
110
6
0
21 Jul 2024
Framework for Curating Speech Datasets and Evaluating ASR Systems: A
  Case Study for Polish
Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish
Mohamed Allam
71
0
0
18 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
89
54
0
07 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
116
28
0
05 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
110
4
0
03 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
119
19
0
30 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of
  Transcribed Audio for Speech Recognition Research
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Song Li
Yongbin You
Xuezhi Wang
Zhengkun Tian
Ke Ding
Guanglu Wan
41
3
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
76
14
0
25 Jun 2024
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake
  Detection
One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection
Hyun Myung Kim
Kangwook Jang
Hoirin Kim
69
7
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
103
3
0
22 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
110
8
0
18 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
128
12
0
17 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech
  Translation
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
Nameer Hirschkind
Xiao Yu
Mahesh Kumar Nandwana
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
66
0
0
14 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
88
17
0
14 Jun 2024
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation
  Detection
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Haoyu Wang
Guoqiang Hu
Guodong Lin
Wei-Qiang Zhang
Jian Li
72
3
0
14 Jun 2024
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
Multi-Modal Retrieval For Large Language Model Based Speech Recognition
J. Kolehmainen
Aditya Gourav
Prashanth Gurunath Shivakumar
Yile Gu
Ankur Gandhe
Ariya Rastrow
Grant P. Strimel
I. Bulyko
85
5
0
13 Jun 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text
  Foundation Models
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models
Jinchuan Tian
Yifan Peng
William Chen
Kwanghee Choi
Karen Livescu
Shinji Watanabe
59
6
0
13 Jun 2024
An Initial Investigation of Language Adaptation for TTS Systems under
  Low-resource Scenarios
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios
Cheng Gong
Erica Cooper
Xin Wang
Chunyu Qiang
Mengzhe Geng
...
Jianwu Dang
Marc Tessier
Aidan Pine
Korin Richmond
Junichi Yamagishi
54
2
0
13 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
91
5
0
10 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
137
20
0
10 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
73
2
0
08 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
72
11
0
07 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar
  Latent Transformer Diffusion Models
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
144
29
0
04 Jun 2024
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
Shengpeng Ji
Jia-li Zuo
Wen Wang
Jialong Zuo
Minghui Fang
...
Ziyue Jiang
Hai Huang
Xize Cheng
Siqi Zheng
Zhou Zhao
109
0
0
03 Jun 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
123
27
0
02 Jun 2024
Deep Learning for Assessment of Oral Reading Fluency
Deep Learning for Assessment of Oral Reading Fluency
Mithilesh Vaidya
Binaya Kumar Sahoo
Preeti Rao
41
0
0
29 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for
  Respiratory Sound Classification
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
81
7
0
05 May 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
86
4
0
30 Apr 2024
Gated Low-rank Adaptation for personalized Code-Switching Automatic
  Speech Recognition on the low-spec devices
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Gwantae Kim
Bokyeung Lee
Donghyeon Kim
Hanseok Ko
OffRL
85
0
0
24 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
102
20
0
23 Apr 2024
Teaching a Multilingual Large Language Model to Understand Multilingual
  Speech via Multi-Instructional Training
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
Pavel Denisov
Ngoc Thang Vu
77
2
0
16 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
130
1
0
16 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
74
28
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
122
44
0
03 Apr 2024
Croissant: A Metadata Format for ML-Ready Datasets
Croissant: A Metadata Format for ML-Ready Datasets
Mubashara Akhtar
Omar Benjelloun
Costanza Conforti
Pieter Gijsbers
Joan Giner-Miguelez
...
Slava Tykhonov
Joaquin Vanschoren
Jos van der Velde
Steffen Vogler
Carole-Jean Wu
80
39
0
28 Mar 2024
Phonetic Segmentation of the UCLA Phonetics Lab Archive
Phonetic Segmentation of the UCLA Phonetics Lab Archive
Eleanor Chodroff
Blaz Pazon
Annie Baker
Steven Moran
108
3
0
28 Mar 2024
Encoding of lexical tone in self-supervised models of spoken language
Encoding of lexical tone in self-supervised models of spoken language
Gaofei Shen
Michaela Watkins
Afra Alishahi
Arianna Bisazza
Grzegorz Chrupala
79
8
0
25 Mar 2024
Improving Acoustic Word Embeddings through Correspondence Training of
  Self-supervised Speech Representations
Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
Amit Meghanani
Thomas Hain
SSL
67
1
0
13 Mar 2024
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Muhammad A. Shah
David Solans Noguero
Mikko A. Heikkilä
Nicolas Kourtellis
54
6
0
08 Mar 2024
Extending Multilingual Speech Synthesis to 100+ Languages without
  Transcribed Data
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
90
14
0
29 Feb 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models
  Exhibit Gender Performance Gaps
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
85
9
0
28 Feb 2024
Direct Punjabi to English speech translation using discrete units
Direct Punjabi to English speech translation using discrete units
Prabhjot Kaur
L. A. M. Bush
Weisong Shi
60
0
0
25 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
109
25
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
95
136
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
131
18
0
19 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLMVLM
94
53
0
08 Feb 2024
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative
  Training for Unsupervised ASR
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
Liang-Hsuan Tseng
En-Pei Hu
Cheng-Han Chiang
Yuan Tseng
Hung-yi Lee
Lin-shan Lee
Shao-Hua Sun
100
1
0
06 Feb 2024
Previous
1234567
Next