mSLAM: Massively multilingual joint pre-training for speech and text

3 February 2022

Colin Cherry

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "mSLAM: Massively multilingual joint pre-training for speech and text"

39 / 89 papers shown

Title
Improving Massively Multilingual ASR With Auxiliary CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 William Chen Brian Yan Jiatong Shi Yifan Peng Soumi Maiti Shinji Watanabe 206 49 0 24 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Biao Zhang Barry Haddow Rico Sennrich 229 3 0 21 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal TransportInternational Conference on Machine Learning (ICML), 2023 Hang Le Hongyu Gong Changhan Wang J. Pino Benjamin Lecouteux D. Schwab OT 312 30 0 27 Jan 2023
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and ChallengesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Genta Indra Winata Alham Fikri Aji Zheng-Xin Yong Thamar Solorio 277 48 0 19 Dec 2022
$Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models$ Mu $^{2}$ SLAM: Multitask, Multilingual Speech and Language ModelsInternational Conference on Machine Learning (ICML), 2022 Yong Cheng Yu Zhang Melvin Johnson Wolfgang Macherey Ankur Bapna 151 9 0 19 Dec 2022
Speech Aware Dialog System Technology Challenge (DSTC11) H. Soltau Izhak Shafran Mingqiu Wang Abhinav Rastogi Jeffrey Zhao Ye Jia Wei Han Yuan Cao Aramys Miranda 172 11 0 16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Mingda Chen Paul-Ambroise Duquenne Pierre Yves Andrews Justine T. Kao Alexandre Mourachko Holger Schwenk Marta R. Costa-jussá 230 23 0 16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Hirofumi Inaguma Sravya Popuri Ilia Kulikov Peng-Jen Chen Changhan Wang Yu-An Chung Yun Tang Ann Lee Shinji Watanabe J. Pino 276 75 0 15 Dec 2022
Robust Speech Recognition via Large-Scale Weak SupervisionInternational Conference on Machine Learning (ICML), 2022 Alec Radford Jong Wook Kim Tao Xu Greg Brockman C. McLeavey Ilya Sutskever OffRL 948 5,578 0 06 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022 Xiaohuan Zhou Jiaming Wang Zeyu Cui Shiliang Zhang Zhijie Yan Jingren Zhou Chang Zhou 208 13 0 29 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training Zhuoyuan Yao Shuo Ren Sanyuan Chen Ziyang Ma Pengcheng Guo Linfu Xie 177 5 0 24 Nov 2022
Towards continually learning new languagesInterspeech (Interspeech), 2022 Ngoc-Quan Pham Jan Niehues A. Waibel CLL 311 4 0 21 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022 Qiu-shi Zhu Long Zhou Zi-Hua Zhang Shujie Liu Binxing Jiao Jie Zhang Lirong Dai Daxin Jiang Jinyu Li Furu Wei 231 48 0 21 Nov 2022
Visual Programming: Compositional visual reasoning without trainingComputer Vision and Pattern Recognition (CVPR), 2022 Tanmay Gupta Aniruddha Kembhavi ReLM VLM LRM 374 556 0 18 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Jiatong Shi Chan-Jan Hsu Ho-Lam Chung Dongji Gao Leibny Paola García-Perera Shinji Watanabe Ann Lee Hung-yi Lee 153 13 0 06 Nov 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural TransducersInterspeech (Interspeech), 2022 Peidong Wang Eric Sun Jian Xue Yu-Huan Wu Long Zhou Yashesh Gaur Shujie Liu Jinyu Li 328 10 0 05 Nov 2022
Towards Zero-Shot Code-Switched Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Brian Yan Sanjeev Khudanpur Ondˇrej Klejch Preethi Jyothi Shinji Watanabe 200 23 0 02 Nov 2022
Textless Direct Speech-to-Speech Translation with Discrete Speech RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Xinjian Li Ye Jia Chung-Cheng Chiu 233 33 0 31 Oct 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Kun Wei Long Zhou Zi-Hua Zhang Liping Chen Shujie Liu Lei He Jinyu Li Furu Wei 166 17 0 31 Oct 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Xianghu Yue Junyi Ao Xiaoxue Gao Haizhou Li SSL 191 8 0 30 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Takaaki Saeki Heiga Zen Zhehuai Chen Nobuyuki Morioka Gary Wang Yu Zhang Ankur Bapna Andrew Rosenberg Bhuvana Ramabhadran 220 22 0 27 Oct 2022
Greedy Modality Selection via Approximate Submodular MaximizationConference on Uncertainty in Artificial Intelligence (UAI), 2022 Runxiang Cheng Gargi Balasubramaniam Yifei He Yifan Hao Han Zhao 129 3 0 22 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASRSpoken Language Technology Workshop (SLT), 2022 Zhehuai Chen Ankur Bapna Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Pedro J. Moreno Nanxin Chen 202 17 0 18 Oct 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Chen Wang Yuchen Liu Boxing Chen Jiajun Zhang Wei Luo Zhongqiang Huang Chengqing Zong 187 10 0 18 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASRSpoken Language Technology Workshop (SLT), 2022 Tara N. Sainath Rohit Prabhavalkar Ankur Bapna Yu Zhang Zhouyuan Huo Zhehuai Chen Yue Liu Weiran Wang Trevor Strohman RALM AuLLM 157 36 0 13 Oct 2022
SQuId: Measuring Speech Naturalness in Many LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Thibault Sellam Ankur Bapna Joshua Camp Diana Mackinnon Ankur P. Parikh Jason Riesa 200 24 0 12 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Zi-Hua Zhang Long Zhou Junyi Ao Shujie Liu Lirong Dai Jinyu Li Furu Wei 243 61 0 07 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Zi-Hua Zhang Sanyuan Chen Long Zhou Yu Wu Shuo Ren ... Zhuoyuan Yao Xun Gong Lirong Dai Jinyu Li Furu Wei 269 65 0 30 Sep 2022
Do Current Multi-Task Optimization Methods in Deep Learning Even Help?Neural Information Processing Systems (NeurIPS), 2022 Derrick Xin Behrooz Ghorbani Ankush Garg Orhan Firat Justin Gilmer MoMe 185 75 0 23 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2022 Farhad Nooralahzadeh Rico Sennrich 221 8 0 07 Sep 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of SpeechSpoken Language Technology Workshop (SLT), 2022 Alexis Conneau Min Ma Simran Khanuja Yu Zhang Vera Axelrod Siddharth Dalmia Jason Riesa Clara E. Rivera Ankur Bapna VLM 420 456 0 25 May 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Paul-Ambroise Duquenne Hongyu Gong Benoît Sagot Holger Schwenk 174 21 0 24 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022 Sameer Khurana Antoine Laurent James R. Glass 145 43 0 17 May 2022
Building Machine Translation Systems for the Next Thousand Languages Ankur Bapna Isaac Caswell Julia Kreutzer Orhan Firat D. Esch ... Apurva Shah Yanping Huang Zhiwen Chen Yonghui Wu Macduff Hughes 262 108 0 09 May 2022
MAESTRO: Matched Speech Text Representations through Modality MatchingInterspeech (Interspeech), 2022 Zhehuai Chen Yu Zhang Andrew Rosenberg Bhuvana Ramabhadran Pedro J. Moreno Ankur Bapna Heiga Zen 213 119 0 07 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with LanguageInternational Conference on Learning Representations (ICLR), 2022 Andy Zeng Maria Attarian Brian Ichter K. Choromanski Adrian S. Wong ... Michael S. Ryoo Vikas Sindhwani Johnny Lee Vincent Vanhoucke Peter R. Florence ReLM LRM 539 676 0 01 Apr 2022
Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translationInterspeech (Interspeech), 2022 Ye Jia Yifan Ding Ankur Bapna Colin Cherry Yu Zhang Alexis Conneau Nobuyuki Morioka 206 24 0 24 Mar 2022
XTREME-S: Evaluating Cross-lingual Speech RepresentationsInterspeech (Interspeech), 2022 Alexis Conneau Ankur Bapna Yu Zhang Min Ma Patrick von Platen ... Orhan Firat Michael Auli Sebastian Ruder Jason Riesa Melvin Johnson VLM AILaw ELM 255 23 0 21 Mar 2022
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Yihong Dong Ying Peng Muqiao Yang Songtao Lu Qingjiang Shi 370 12 0 05 Jun 2021

All Papers

mSLAM: Massively multilingual joint pre-training for speech and text

Papers citing "mSLAM: Massively multilingual joint pre-training for speech and text"