Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2202.01374
Cited By
mSLAM: Massively multilingual joint pre-training for speech and text
3 February 2022
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"mSLAM: Massively multilingual joint pre-training for speech and text"
39 / 89 papers shown
Title
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
206
49
0
24 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Biao Zhang
Barry Haddow
Rico Sennrich
229
3
0
21 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
International Conference on Machine Learning (ICML), 2023
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
312
30
0
27 Jan 2023
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Genta Indra Winata
Alham Fikri Aji
Zheng-Xin Yong
Thamar Solorio
277
48
0
19 Dec 2022
Mu
2
^{2}
2
SLAM: Multitask, Multilingual Speech and Language Models
International Conference on Machine Learning (ICML), 2022
Yong Cheng
Yu Zhang
Melvin Johnson
Wolfgang Macherey
Ankur Bapna
151
9
0
19 Dec 2022
Speech Aware Dialog System Technology Challenge (DSTC11)
H. Soltau
Izhak Shafran
Mingqiu Wang
Abhinav Rastogi
Jeffrey Zhao
Ye Jia
Wei Han
Yuan Cao
Aramys Miranda
172
11
0
16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
230
23
0
16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hirofumi Inaguma
Sravya Popuri
Ilia Kulikov
Peng-Jen Chen
Changhan Wang
Yu-An Chung
Yun Tang
Ann Lee
Shinji Watanabe
J. Pino
276
75
0
15 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
948
5,578
0
06 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Interspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
208
13
0
29 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
177
5
0
24 Nov 2022
Towards continually learning new languages
Interspeech (Interspeech), 2022
Ngoc-Quan Pham
Jan Niehues
A. Waibel
CLL
311
4
0
21 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
IEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
231
48
0
21 Nov 2022
Visual Programming: Compositional visual reasoning without training
Computer Vision and Pattern Recognition (CVPR), 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
374
556
0
18 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
153
13
0
06 Nov 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Interspeech (Interspeech), 2022
Peidong Wang
Eric Sun
Jian Xue
Yu-Huan Wu
Long Zhou
Yashesh Gaur
Shujie Liu
Jinyu Li
328
10
0
05 Nov 2022
Towards Zero-Shot Code-Switched Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Brian Yan
Sanjeev Khudanpur
Ondˇrej Klejch
Preethi Jyothi
Shinji Watanabe
200
23
0
02 Nov 2022
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xinjian Li
Ye Jia
Chung-Cheng Chiu
233
33
0
31 Oct 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kun Wei
Long Zhou
Zi-Hua Zhang
Liping Chen
Shujie Liu
Lei He
Jinyu Li
Furu Wei
166
17
0
31 Oct 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
191
8
0
30 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
220
22
0
27 Oct 2022
Greedy Modality Selection via Approximate Submodular Maximization
Conference on Uncertainty in Artificial Intelligence (UAI), 2022
Runxiang Cheng
Gargi Balasubramaniam
Yifei He
Yifan Hao
Han Zhao
129
3
0
22 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Spoken Language Technology Workshop (SLT), 2022
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
202
17
0
18 Oct 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Wang
Yuchen Liu
Boxing Chen
Jiajun Zhang
Wei Luo
Zhongqiang Huang
Chengqing Zong
187
10
0
18 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Spoken Language Technology Workshop (SLT), 2022
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Yue Liu
Weiran Wang
Trevor Strohman
RALM
AuLLM
157
36
0
13 Oct 2022
SQuId: Measuring Speech Naturalness in Many Languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Thibault Sellam
Ankur Bapna
Joshua Camp
Diana Mackinnon
Ankur P. Parikh
Jason Riesa
200
24
0
12 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
243
61
0
07 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
269
65
0
30 Sep 2022
Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Neural Information Processing Systems (NeurIPS), 2022
Derrick Xin
Behrooz Ghorbani
Ankush Garg
Orhan Firat
Justin Gilmer
MoMe
185
75
0
23 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2022
Farhad Nooralahzadeh
Rico Sennrich
221
8
0
07 Sep 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Spoken Language Technology Workshop (SLT), 2022
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
420
456
0
25 May 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul-Ambroise Duquenne
Hongyu Gong
Benoît Sagot
Holger Schwenk
174
21
0
24 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
145
43
0
17 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
262
108
0
09 May 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
Interspeech (Interspeech), 2022
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
213
119
0
07 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
International Conference on Learning Representations (ICLR), 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
539
676
0
01 Apr 2022
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
Interspeech (Interspeech), 2022
Ye Jia
Yifan Ding
Ankur Bapna
Colin Cherry
Yu Zhang
Alexis Conneau
Nobuyuki Morioka
206
24
0
24 Mar 2022
XTREME-S: Evaluating Cross-lingual Speech Representations
Interspeech (Interspeech), 2022
Alexis Conneau
Ankur Bapna
Yu Zhang
Min Ma
Patrick von Platen
...
Orhan Firat
Michael Auli
Sebastian Ruder
Jason Riesa
Melvin Johnson
VLM
AILaw
ELM
255
23
0
21 Mar 2022
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
370
12
0
05 Jun 2021
Previous
1
2