ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03411
  4. Cited By
MLS: A Large-Scale Multilingual Dataset for Speech Research
v1v2 (latest)

MLS: A Large-Scale Multilingual Dataset for Speech Research

7 December 2020
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "MLS: A Large-Scale Multilingual Dataset for Speech Research"

50 / 321 papers shown
Title
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and
  Development
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
Yanir Marmor
Kinneret Misgav
Y. Lifshitz
VLM
95
3
0
17 Jul 2023
Towards cross-language prosody transfer for dialog
Towards cross-language prosody transfer for dialog
Jonathan Avila
Nigel G. Ward
67
7
0
09 Jul 2023
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via
  Adversarial Ultrasound
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound
Xinfeng Li
Junning Ze
Chen Yan
Yushi Cheng
Xiaoyu Ji
Wenyuan Xu
AAML
68
12
0
28 Jun 2023
Confidence-based Ensembles of End-to-End Speech Recognition Models
Confidence-based Ensembles of End-to-End Speech Recognition Models
Igor Gitman
Vitaly Lavrukhin
A. Laptev
Boris Ginsburg
UQCV
82
9
0
27 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MAAuLLMVLM
138
295
0
22 Jun 2023
Unified model for code-switching speech recognition and language
  identification based on a concatenated tokenizer
Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer
Kunal Dhawan
KDimating Rekesh
Boris Ginsburg
48
12
0
14 Jun 2023
Label Aware Speech Representation Learning For Language Identification
Label Aware Speech Representation Learning For Language Identification
Shikhar Vashishth
Shikhar Bharadwaj
Sriram Ganapathy
Ankur Bapna
Min Ma
Wei Han
Vera Axelrod
Partha P. Talukdar
SSL
61
4
0
07 Jun 2023
Acoustic Word Embeddings for Untranscribed Target Languages with
  Continued Pretraining and Learned Pooling
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling
Ramon Sanabria
Ondˇrej Klejch
Hao Tang
Sharon Goldwater
51
2
0
03 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
89
1
0
01 Jun 2023
How to Estimate Model Transferability of Pre-Trained Speech Models?
How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Shoufeng Chang
Rohit Prabhavalkar
Hung-yi Lee
Tara N. Sainath
132
9
0
01 Jun 2023
Edit Distance based RL for RNNT decoding
Edit Distance based RL for RNNT decoding
DongSeon Hwang
Changwan Ryu
K. Sim
30
0
0
31 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
53
8
0
26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in
  End-to-End Zero-Shot Speech Synthesis
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
70
1
0
26 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
164
360
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLMSyDa
127
61
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
  Pre-Training for Adaptation to Unseen Languages
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
99
22
0
21 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
76
3
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
65
5
0
19 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
126
70
0
18 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Understanding and Bridging the Modality Gap for Speech Translation
Qingkai Fang
Yang Feng
75
26
0
15 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
83
3
0
09 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
101
92
0
08 May 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
104
247
0
18 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
66
26
0
13 Apr 2023
Enhancing Unsupervised Speech Recognition with Diffusion GANs
Enhancing Unsupervised Speech Recognition with Diffusion GANs
Xianchao Wu
DiffM
53
2
0
23 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance
  body-conducted speech capture
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
55
7
0
17 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
170
270
0
02 Mar 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
84
40
0
24 Feb 2023
Catch You and I Can: Revealing Source Voiceprint Against Voice
  Conversion
Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion
Jiangyi Deng
Yanjiao Chen
Yinan Zhong
Qianhao Miao
Xueluan Gong
Wenyuan Xu Zhejiang University
85
8
0
24 Feb 2023
Speaker and Language Change Detection using Wav2vec2 and Whisper
Speaker and Language Change Detection using Wav2vec2 and Whisper
Tijn Berns
Nik Vaessen
David A. van Leeuwen
69
5
0
18 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
62
1
0
12 Feb 2023
From English to More Languages: Parameter-Efficient Model Reprogramming
  for Cross-Lingual Speech Recognition
From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Rohit Prabhavalkar
Tara N. Sainath
Trevor Strohman
60
30
0
19 Jan 2023
Scaling Laws for Generative Mixed-Modal Language Models
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan
L. Yu
Alexis Conneau
Wei-Ning Hsu
Karen Hambardzumyan
Susan Zhang
Stephen Roller
Naman Goyal
Omer Levy
Luke Zettlemoyer
MoEVLM
92
110
0
10 Jan 2023
Supervised Acoustic Embeddings And Their Transferability Across
  Languages
Supervised Acoustic Embeddings And Their Transferability Across Languages
Sreepratha Ram
Hanan Aldarmaki
SSL
54
3
0
03 Jan 2023
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
87
12
0
21 Dec 2022
Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Mu2^{2}2SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng
Yu Zhang
Melvin Johnson
Wolfgang Macherey
Ankur Bapna
64
8
0
19 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Hirofumi Inaguma
Sravya Popuri
Ilia Kulikov
Peng-Jen Chen
Changhan Wang
Yu-An Chung
Yun Tang
Ann Lee
Shinji Watanabe
J. Pino
110
61
0
15 Dec 2022
Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in
  Videos
Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos
Khalid Alnajjar
Mika Hämäläinen
Shuo Zhang
68
8
0
15 Dec 2022
Towards trustworthy phoneme boundary detection with autoregressive model
  and improved evaluation metric
Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric
Hyeongju Kim
Hyeong-Seok Choi
33
2
0
13 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,760
0
06 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
104
8
0
30 Nov 2022
Dialogs Re-enacted Across Languages
Dialogs Re-enacted Across Languages
Nigel G. Ward
Jonathan Avila
Emilia Rivas
Divette Marco
52
2
0
18 Nov 2022
Casual Conversations v2: Designing a large consent-driven dataset to
  measure algorithmic bias and robustness
Casual Conversations v2: Designing a large consent-driven dataset to measure algorithmic bias and robustness
C. Hazirbas
Yejin Bang
Tiezheng Yu
Parisa Assar
Bilal Porgali
...
Jacqueline Pan
Emily McReynolds
Miranda Bogen
Pascale Fung
Cristian Canton Ferrer
81
8
0
10 Nov 2022
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture,
  and Generalization Capabilities
Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities
Andros Tjandra
Nayan Singhal
David C. Zhang
Ozlem Kalinli
Abdel-rahman Mohamed
Duc Le
M. Seltzer
88
13
0
10 Nov 2022
Multi-blank Transducers for Speech Recognition
Multi-blank Transducers for Speech Recognition
Hainan Xu
Fei Jia
Somshubra Majumdar
Shinji Watanabe
Boris Ginsburg
80
11
0
04 Nov 2022
I4U System Description for NIST SRE'20 CTS Challenge
I4U System Description for NIST SRE'20 CTS Challenge
Kong Aik Lee
Tomi Kinnunen
Daniele Colibro
C. Vair
A. Nautsch
...
Ruijie Tao
Haizhou Li
Alfonso Ortega Giménez
Longbiao Wang
L. Buera
24
0
0
02 Nov 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
124
20
0
27 Oct 2022
Multi-class Detection of Pathological Speech with Latent Features: How
  does it perform on unseen data?
Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?
Dominik Wagner
Ilja Baumann
Franziska Braun
Sebastian P. Bayerl
Elmar Nöth
Korbinian Riedhammer
Tobias Bocklet
73
13
0
27 Oct 2022
Improving Speech-to-Speech Translation Through Unlabeled Text
Improving Speech-to-Speech Translation Through Unlabeled Text
Xuan-Phi Nguyen
Sravya Popuri
Changhan Wang
Yun Tang
Ilia Kulikov
Hongyu Gong
63
9
0
26 Oct 2022
EBEN: Extreme bandwidth extension network applied to speech signals
  captured with noise-resilient body-conduction microphones
EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones
J. Hauret
Thomas Joubaud
V. Zimpfer
Éric Bavu
48
10
0
25 Oct 2022
Previous
1234567
Next