Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.03409
Cited By
v1
v2 (latest)
MAESTRO: Matched Speech Text Representations through Modality Matching
Interspeech (Interspeech), 2022
7 April 2022
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MAESTRO: Matched Speech Text Representations through Modality Matching"
50 / 76 papers shown
Title
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
Chenyang Le
Yinfeng Xia
Huiyan Li
Manhong Wang
Yutao Sun
Xingyang Ma
Yanmin Qian
56
0
0
15 Aug 2025
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting
Pai Zhu
Quan Wang
Dhruuv Agarwal
Kurt Partridge
99
1
0
29 May 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
International Conference on Learning Representations (ICLR), 2025
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
212
4
0
02 Mar 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Pattern Recognition (Pattern Recogn.), 2022
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
338
3
0
24 Feb 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni
Liming Wang
Yang Zhang
Kaizhi Qian
Heting Gao
Mark Hasegawa-Johnson
Chang D. Yoo
SSL
OffRL
322
3
0
10 Jan 2025
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
Spoken Language Technology Workshop (SLT), 2024
Yingyi Ma
Zhe Liu
Ozlem Kalinli
307
1
0
09 Dec 2024
AMPS: ASR with Multimodal Paraphrase Supervision
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Amruta Parulekar
Abhishek Gupta
Sameep Chattopadhyay
Preethi Jyothi
261
0
0
27 Nov 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Spoken Language Technology Workshop (SLT), 2024
Rui Zhao
Jinyu Li
Ruchao Fan
Matt Post
144
2
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
413
61
0
01 Oct 2024
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments
Pai Zhu
Dhruuv Agarwal
Jacob Bartel
Kurt Partridge
Hyun Jin Park
Quan Wang
148
3
0
23 Jul 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
Preethi Jyothi
Pushpak Bhattacharyya
246
4
0
16 Jun 2024
An efficient text augmentation approach for contextualized Mandarin speech recognition
Interspeech (Interspeech), 2024
Naijun Zheng
Xucheng Wan
Kai Liu
Ziqing Du
Zhou Huan
142
2
0
14 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
250
2
0
10 Jun 2024
Text Injection for Neural Contextual Biasing
Zhong Meng
Zelin Wu
Rohit Prabhavalkar
Cal Peyser
Weiran Wang
Nanxin Chen
Tara N. Sainath
Bhuvana Ramabhadran
281
6
0
05 Jun 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
233
17
0
29 Feb 2024
Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALM
AuLLM
144
21
0
02 Feb 2024
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings
Christopher Li
Gary Wang
Kyle Kastner
Heng Su
Allen Chen
...
Zelin Wu
L. Velikovich
Pat Rondon
D. Caseiro
Petar S. Aleksic
129
2
0
08 Jan 2024
FastInject: Injecting Unpaired Text Data into CTC-based ASR training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Keqi Deng
Phil Woodland
143
3
0
14 Dec 2023
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming Transducer
Automatic Speech Recognition & Understanding (ASRU), 2023
Jin Qiu
Lu Huang
Boyu Li
Jun Zhang
Lu Lu
Zejun Ma
259
7
0
15 Nov 2023
Towards a Deep Understanding of Multilingual End-to-End Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoran Sun
Xiaohu Zhao
Yikun Lei
Shaolin Zhu
Deyi Xiong
163
8
0
31 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
210
80
0
13 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
188
26
0
12 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Piyush Singh Pasi
Karthikeya Battepati
Preethi Jyothi
Ganesh Ramakrishnan
T. Mahapatra
Manoj Singh
156
0
0
10 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Automatic Speech Recognition & Understanding (ASRU), 2023
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
186
5
0
09 Oct 2023
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
213
1
0
09 Oct 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Interspeech (Interspeech), 2023
Paul-Ambroise Duquenne
Holger Schwenk
Benoît Sagot
239
3
0
05 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Automatic Speech Recognition & Understanding (ASRU), 2023
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
248
70
0
30 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
222
5
0
27 Sep 2023
Multimodal Modeling For Spoken Language Identification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
142
0
0
19 Sep 2023
Augmenting text for spoken language understanding with Large Language Models
Roshan Sharma
Suyoun Kim
Daniel Lazar
Trang Le
Akshat Shrivastava
Kwanghoon Ahn
Piyush Kansal
Leda Sari
Ozlem Kalinli
Michael Seltzer
225
2
0
17 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
VLM
AuLLM
RALM
178
11
0
16 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
273
86
0
14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation
Interspeech (Interspeech), 2023
Jiaxu Zhu
Weinan Tong
Yaoxun Xu
Chang Song
Zhiyong Wu
Zhao You
Jane Polak Scowcroft
Dong Yu
Helen M. Meng
128
0
0
04 Sep 2023
End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Bolaji Yusuf
J. Černocký
Murat Saraclar
158
2
0
15 Aug 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Interspeech (Interspeech), 2023
Shaan Bijwadia
Shuo-yiin Chang
Weiran Wang
Zhong Meng
Hao Zhang
Tara N. Sainath
120
3
0
14 Aug 2023
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Interspeech (Interspeech), 2023
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
104
3
0
14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Interspeech (Interspeech), 2023
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Dong Wang
VLM
185
4
0
11 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
551
44
0
27 Jul 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
225
379
0
22 Jun 2023
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALM
AuLLM
170
9
0
08 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Interspeech (Interspeech), 2023
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
195
4
0
07 Jun 2023
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Matías P. Pizarro
D. Kolossa
Asja Fischer
AAML
308
1
0
26 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Neural Information Processing Systems (NeurIPS), 2023
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
210
18
0
24 May 2023
CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yan Zhou
Qingkai Fang
Yang Feng
OT
292
40
0
24 May 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
296
499
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
371
149
0
18 May 2023
Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Shuichiro Shimizu
Chenhui Chu
Sheng Li
Sadao Kurohashi Kyoto University
84
3
0
16 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
156
16
0
15 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingkai Fang
Yang Feng
209
29
0
15 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
284
5
0
11 May 2023
1
2
Next