Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2202.01374
Cited By
mSLAM: Massively multilingual joint pre-training for speech and text
3 February 2022
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"mSLAM: Massively multilingual joint pre-training for speech and text"
50 / 89 papers shown
Title
SENSE models: an open source solution for multilingual and multimodal semantic-based tasks
Salima Mdhaffar
Haroun Elleuch
Chaimae Chellaf
H. Nguyen
Yannick Esteve
VLM
112
0
0
15 Sep 2025
Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
A. F. M. Saif
Lisha Chen
Xiaodong Cui
Songtao Lu
Brian Kingsbury
Tianyi Chen
77
0
0
12 Aug 2025
Self-Improvement for Audio Large Language Model using Unlabeled Speech
S. Wang
Xinyuan Chen
Yao Xu
AuLLM
144
4
0
27 Jul 2025
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Pradeep Rangappa
Andres Carofilis
Jeena Prakash
Shashi Kumar
Sergio Burdisso
...
P. Motlícek
Kadri Hacioğlu
Shankar Venkatesan
Saurabh Vyas
Andreas Stolcke
170
2
0
04 Jun 2025
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
Taesoo Kim
Jong Hwan Ko
AuLLM
121
0
0
01 Jun 2025
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Chang-rui Liu
Haolin Wu
Xi Yang
Kui Zhang
Cong Wu
Weinan Zhang
Nenghai Yu
Tianwei Zhang
Qing Guo
Jie Zhang
AAML
290
0
0
02 Mar 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Pattern Recognition (Pattern Recogn.), 2022
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
366
3
0
24 Feb 2025
STORM: Strategic Orchestration of Modalities for Rare Event Classification
Asilomar Conference on Signals, Systems and Computers (ACSSC), 2024
Payal Kamboj
Ayan Banerjee
Sandeep K. S. Gupta
175
1
0
03 Dec 2024
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Sparsh Jain
Ashwin Sankar
Devilal Choudhary
Dhairya Suman
Nikhil Narasimhan
Mohammed Safi Ur Rahman Khan
Anoop Kunchukuttan
Mitesh M. Khapra
Mary Dabre
415
2
0
07 Nov 2024
EMMeTT: Efficient Multimodal Machine Translation Training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
165
4
0
20 Sep 2024
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
251
8
0
05 Sep 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
Preethi Jyothi
Pushpak Bhattacharyya
270
4
0
16 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
270
2
0
10 Jun 2024
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Robust and Instruction-Aware ASR and OCR
Chan-Jan Hsu
Yi-Chang Chen
Feng-Ting Liao
Pei-Chen Ho
Yu-Hsiang Wang
Po-Chun Hsu
Da-shan Shiu
450
3
0
23 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
392
67
0
14 May 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
International Workshop on Spoken Language Translation (IWSLT), 2024
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
349
8
0
02 Apr 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
249
17
0
29 Feb 2024
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Junwen Bai
Yue Liu
Qiujia Li
Tara N. Sainath
Trevor Strohman
317
7
0
17 Jan 2024
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Dami Choi
Derrick Xin
Hamid Dadkhahi
Justin Gilmer
Ankush Garg
Orhan Firat
Chih-Kuan Yeh
Andrew M. Dai
Behrooz Ghorbani
254
6
0
11 Dec 2023
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
International Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
124
4
0
27 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
206
3
0
01 Nov 2023
Toward Joint Language Modeling for Speech Units and Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
192
27
0
12 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Automatic Speech Recognition & Understanding (ASRU), 2023
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
214
6
0
09 Oct 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Interspeech (Interspeech), 2023
Paul-Ambroise Duquenne
Holger Schwenk
Benoît Sagot
247
3
0
05 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Automatic Speech Recognition & Understanding (ASRU), 2023
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
292
71
0
30 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
230
5
0
27 Sep 2023
Multimodal Modeling For Spoken Language Identification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
154
0
0
19 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
IEEE Signal Processing Letters (IEEE SPL), 2023
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
126
5
0
14 Sep 2023
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Interspeech (Interspeech), 2023
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
108
3
0
14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Interspeech (Interspeech), 2023
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Dong Wang
VLM
201
4
0
11 Aug 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
245
384
0
22 Jun 2023
Recent Advances in Direct Speech-to-text Translation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
258
29
0
20 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Interspeech (Interspeech), 2023
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
201
14
0
14 Jun 2023
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
174
12
0
13 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Interspeech (Interspeech), 2023
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
207
4
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
319
1
0
01 Jun 2023
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Shuyue Stella Li
Cihan Xiao
Tianjian Li
Bismarck Odoom
95
4
0
31 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
221
22
0
27 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Neural Information Processing Systems (NeurIPS), 2023
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
246
19
0
24 May 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
333
511
0
22 May 2023
Textually Pretrained Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
359
92
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Interspeech (Interspeech), 2023
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
272
24
0
21 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
487
151
0
18 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
176
16
0
15 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
213
29
0
15 May 2023
SLTUNET: A Simple Unified Model for Sign Language Translation
International Conference on Learning Representations (ICLR), 2023
Biao Zhang
Mathias Müller
Rico Sennrich
SLR
165
44
0
02 May 2023
Understanding Shared Speech-Text Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Gary Wang
Kyle Kastner
Ankur Bapna
Zhehuai Chen
Andrew Rosenberg
Bhuvana Ramabhadran
Yu Zhang
AuLLM
126
7
0
27 Apr 2023
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jinjie Ni
Yukun Ma
Wen Wang
Qian Chen
Dianwen Ng
Han Lei
Trung Hieu Nguyen
Chong Zhang
B. Ma
Xiaoshi Zhong
69
2
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
260
239
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
346
345
0
02 Mar 2023
1
2
Next