Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2202.01374
Cited By
mSLAM: Massively multilingual joint pre-training for speech and text
3 February 2022
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"mSLAM: Massively multilingual joint pre-training for speech and text"
50 / 89 papers shown
Title
SENSE models: an open source solution for multilingual and multimodal semantic-based tasks
Salima Mdhaffar
Haroun Elleuch
Chaimae Chellaf
H. Nguyen
Yannick Esteve
VLM
92
0
0
15 Sep 2025
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
A. F. M. Saif
Lisha Chen
Xiaodong Cui
Songtao Lu
Brian Kingsbury
Tianyi Chen
69
0
0
12 Aug 2025
Self-Improvement for Audio Large Language Model using Unlabeled Speech
S. Wang
Xinyuan Chen
Yao Xu
AuLLM
132
4
0
27 Jul 2025
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Pradeep Rangappa
Andres Carofilis
Jeena Prakash
Shashi Kumar
Sergio Burdisso
...
P. Motlícek
Kadri Hacioğlu
Shankar Venkatesan
Saurabh Vyas
Andreas Stolcke
158
2
0
04 Jun 2025
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
Taesoo Kim
Jong Hwan Ko
AuLLM
97
0
0
01 Jun 2025
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Chang-rui Liu
Haolin Wu
Xi Yang
Kui Zhang
Cong Wu
Weinan Zhang
Nenghai Yu
Tianwei Zhang
Qing Guo
Jie Zhang
AAML
266
0
0
02 Mar 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Pattern Recognition (Pattern Recogn.), 2022
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
350
3
0
24 Feb 2025
STORM: Strategic Orchestration of Modalities for Rare Event Classification
Asilomar Conference on Signals, Systems and Computers (ACSSC), 2024
Payal Kamboj
Ayan Banerjee
Sandeep K. S. Gupta
171
1
0
03 Dec 2024
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Sparsh Jain
Ashwin Sankar
Devilal Choudhary
Dhairya Suman
Nikhil Narasimhan
Mohammed Safi Ur Rahman Khan
Anoop Kunchukuttan
Mitesh M. Khapra
Mary Dabre
391
3
0
07 Nov 2024
EMMeTT: Efficient Multimodal Machine Translation Training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
157
4
0
20 Sep 2024
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
239
8
0
05 Sep 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
Preethi Jyothi
Pushpak Bhattacharyya
254
4
0
16 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
258
2
0
10 Jun 2024
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Robust and Instruction-Aware ASR and OCR
Chan-Jan Hsu
Yi-Chang Chen
Feng-Ting Liao
Pei-Chen Ho
Yu-Hsiang Wang
Po-Chun Hsu
Da-shan Shiu
414
3
0
23 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
368
67
0
14 May 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
International Workshop on Spoken Language Translation (IWSLT), 2024
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
349
8
0
02 Apr 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
241
17
0
29 Feb 2024
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Junwen Bai
Yue Liu
Qiujia Li
Tara N. Sainath
Trevor Strohman
317
7
0
17 Jan 2024
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Dami Choi
Derrick Xin
Hamid Dadkhahi
Justin Gilmer
Ankush Garg
Orhan Firat
Chih-Kuan Yeh
Andrew M. Dai
Behrooz Ghorbani
218
6
0
11 Dec 2023
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
International Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
120
4
0
27 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
190
3
0
01 Nov 2023
Toward Joint Language Modeling for Speech Units and Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
188
27
0
12 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Automatic Speech Recognition & Understanding (ASRU), 2023
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
190
6
0
09 Oct 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Interspeech (Interspeech), 2023
Paul-Ambroise Duquenne
Holger Schwenk
Benoît Sagot
239
3
0
05 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Automatic Speech Recognition & Understanding (ASRU), 2023
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
268
70
0
30 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
222
5
0
27 Sep 2023
Multimodal Modeling For Spoken Language Identification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
146
0
0
19 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
IEEE Signal Processing Letters (IEEE SPL), 2023
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
114
5
0
14 Sep 2023
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Interspeech (Interspeech), 2023
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
108
3
0
14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Interspeech (Interspeech), 2023
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Dong Wang
VLM
189
4
0
11 Aug 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
237
380
0
22 Jun 2023
Recent Advances in Direct Speech-to-text Translation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
250
29
0
20 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech (Interspeech), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
201
14
0
14 Jun 2023
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
170
12
0
13 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Interspeech (Interspeech), 2023
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
199
4
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
311
1
0
01 Jun 2023
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Shuyue Stella Li
Cihan Xiao
Tianjian Li
Bismarck Odoom
95
4
0
31 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
221
22
0
27 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Neural Information Processing Systems (NeurIPS), 2023
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
222
19
0
24 May 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
321
505
0
22 May 2023
Textually Pretrained Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
327
90
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Interspeech (Interspeech), 2023
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
260
24
0
21 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
427
150
0
18 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
176
16
0
15 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
213
29
0
15 May 2023
SLTUNET: A Simple Unified Model for Sign Language Translation
International Conference on Learning Representations (ICLR), 2023
Biao Zhang
Mathias Müller
Rico Sennrich
SLR
165
44
0
02 May 2023
Understanding Shared Speech-Text Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Gary Wang
Kyle Kastner
Ankur Bapna
Zhehuai Chen
Andrew Rosenberg
Bhuvana Ramabhadran
Yu Zhang
AuLLM
126
7
0
27 Apr 2023
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jinjie Ni
Yukun Ma
Wen Wang
Qian Chen
Dianwen Ng
Han Lei
Trung Hieu Nguyen
Chong Zhang
B. Ma
Xiaoshi Zhong
69
2
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
244
237
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
330
344
0
02 Mar 2023
1
2
Next