v1v2 (latest)

MAESTRO: Matched Speech Text Representations through Modality Matching

Interspeech (Interspeech), 2022

7 April 2022

Zhehuai Chen

Yu Zhang

Andrew Rosenberg

Bhuvana Ramabhadran

Papers citing "MAESTRO: Matched Speech Text Representations through Modality Matching"

50 / 76 papers shown

Title
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation Chenyang Le Yinfeng Xia Huiyan Li Manhong Wang Yutao Sun Xingyang Ma Yanmin Qian 56 0 0 15 Aug 2025
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting Pai Zhu Quan Wang Dhruuv Agarwal Kurt Partridge 99 1 0 29 May 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and GenerationInternational Conference on Learning Representations (ICLR), 2025 Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 212 4 0 02 Mar 2025
Graph Perceiver IO: A General Architecture for Graph Structured DataPattern Recognition (Pattern Recogn.), 2022 Seyun Bae Hoyoon Byun Changdae Oh Yoon-Sik Cho Kyungwoo Song GNN 338 3 0 24 Feb 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models Junrui Ni Liming Wang Yang Zhang Kaizhi Qian Heting Gao Mark Hasegawa-Johnson Chang D. Yoo SSL OffRL 322 3 0 10 Jan 2025
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-TuningSpoken Language Technology Workshop (SLT), 2024 Yingyi Ma Zhe Liu Ozlem Kalinli 307 1 0 09 Dec 2024
AMPS: ASR with Multimodal Paraphrase SupervisionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024 Amruta Parulekar Abhishek Gupta Sameep Chattopadhyay Preethi Jyothi 261 0 0 27 Nov 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translationSpoken Language Technology Workshop (SLT), 2024 Rui Zhao Jinyu Li Ruchao Fan Matt Post 144 2 0 07 Oct 2024
Recent Advances in Speech Language Models: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Wenqian Cui Dianzhi Yu Xiaoqi Jiao Ziqiao Meng Guangyan Zhang Qichao Wang Yiwen Guo Irwin King AuLLM 413 61 0 01 Oct 2024
Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments Pai Zhu Dhruuv Agarwal Jacob Bartel Kurt Partridge Hyun Jin Park Quan Wang 148 3 0 23 Jul 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving Bhavani Shankar Preethi Jyothi Pushpak Bhattacharyya 246 4 0 16 Jun 2024
An efficient text augmentation approach for contextualized Mandarin speech recognitionInterspeech (Interspeech), 2024 Naijun Zheng Xucheng Wan Kai Liu Ziqing Du Zhou Huan 142 2 0 14 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling Neeraj Gaur Rohan Agrawal Gary Wang Parisa Haghani Andrew Rosenberg Bhuvana Ramabhadran 250 2 0 10 Jun 2024
Text Injection for Neural Contextual Biasing Zhong Meng Zelin Wu Rohit Prabhavalkar Cal Peyser Weiran Wang Nanxin Chen Tara N. Sainath Bhuvana Ramabhadran 281 6 0 05 Jun 2024
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data Takaaki Saeki Gary Wang Nobuyuki Morioka Isaac Elias Kyle Kastner ... Andrew Rosenberg Bhuvana Ramabhadran Heiga Zen Francoise Beaufays Hadar Shemtov 233 17 0 29 Feb 2024
Retrieval Augmented End-to-End Spoken Dialog Models Mingqiu Wang Izhak Shafran H. Soltau Wei Han Yuan Cao Dian Yu Laurent El Shafey RALM AuLLM 144 21 0 02 Feb 2024
High-precision Voice Search Query Correction via Retrievable Speech-text Embedings Christopher Li Gary Wang Kyle Kastner Heng Su Allen Chen ... Zelin Wu L. Velikovich Pat Rondon D. Caseiro Petar S. Aleksic 129 2 0 08 Jan 2024
FastInject: Injecting Unpaired Text Data into CTC-based ASR trainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Keqi Deng Phil Woodland 143 3 0 14 Dec 2023
Improving Large-scale Deep Biasing with Phoneme Features and Text-only Data in Streaming TransducerAutomatic Speech Recognition & Understanding (ASRU), 2023 Jin Qiu Lu Huang Boyu Li Jun Zhang Lu Lu Zejun Ma 259 7 0 15 Nov 2023
Towards a Deep Understanding of Multilingual End-to-End Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Haoran Sun Xiaohu Zhao Yikun Lei Shaolin Zhu Deyi Xiong 163 8 0 31 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation Zhehuai Chen He Huang A. Andrusenko Oleksii Hrinchuk Krishna C. Puvvada Jason Chun Lok Li Subhankar Ghosh Jagadeesh Balam Boris Ginsburg LRM 210 80 0 13 Oct 2023
Toward Joint Language Modeling for Speech Units and TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Ju-Chieh Chou Chung-Ming Chien Wei-Ning Hsu Karen Livescu Arun Babu Alexis Conneau Alexei Baevski Michael Auli VLM 188 26 0 12 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data IntegrationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023 Piyush Singh Pasi Karthikeya Battepati Preethi Jyothi Ganesh Ramakrishnan T. Mahapatra Manoj Singh 156 0 0 10 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text ModelsAutomatic Speech Recognition & Understanding (ASRU), 2023 Chung-Ming Chien Mingjiamei Zhang Ju-Chieh Chou Karen Livescu 186 5 0 09 Oct 2023
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent SynthesisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Jianqiao Lu Wenyong Huang Nianzu Zheng Xingshan Zeng Y. Yeung Xiao Chen SyDa 213 1 0 09 Oct 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal TransferInterspeech (Interspeech), 2023 Paul-Ambroise Duquenne Holger Schwenk Benoît Sagot 239 3 0 05 Oct 2023
SLM: Bridge the thin gap between speech and text foundation modelsAutomatic Speech Recognition & Understanding (ASRU), 2023 Mingqiu Wang Wei Han Izhak Shafran Zelin Wu Chung-Cheng Chiu ... Zhong Meng Golan Pundak Nikhil Siddhartha J. Schalkwyk Yonghui Wu AuLLM 248 70 0 30 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 B. Grimstad Xuankai Chang Antonios Anastasopoulos Yuya Fujita Shinji Watanabe 222 5 0 27 Sep 2023
Multimodal Modeling For Spoken Language IdentificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Shikhar Bharadwaj Min Ma Shikhar Vashishth Ankur Bapna Sriram Ganapathy ... Yu Zhang D. Esch Sandy Ritchie Partha P. Talukdar Jason Riesa 142 0 0 19 Sep 2023
Augmenting text for spoken language understanding with Large Language Models Roshan Sharma Suyoun Kim Daniel Lazar Trang Le Akshat Shrivastava Kwanghoon Ahn Piyush Kansal Leda Sari Ozlem Kalinli Michael Seltzer 225 2 0 17 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation E. Tsunoo Hayato Futami Yosuke Kashiwagi Siddhant Arora Shinji Watanabe VLM AuLLM RALM 178 11 0 16 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Soumi Maiti Yifan Peng Shukjae Choi Jee-weon Jung Xuankai Chang Shinji Watanabe VLM AuLLM 273 86 0 14 Sep 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic RepresentationInterspeech (Interspeech), 2023 Jiaxu Zhu Weinan Tong Yaoxun Xu Chang Song Zhiyong Wu Zhao You Jane Polak Scowcroft Dong Yu Helen M. Meng 128 0 0 04 Sep 2023
End-to-End Open Vocabulary Keyword Search With Multilingual Neural RepresentationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Bolaji Yusuf J. Černocký Murat Saraclar 158 2 0 15 Aug 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech ModelsInterspeech (Interspeech), 2023 Shaan Bijwadia Shuo-yiin Chang Weiran Wang Zhong Meng Hao Zhang Tara N. Sainath 120 3 0 14 Aug 2023
Using Text Injection to Improve Recognition of Personal Identifiers in SpeechInterspeech (Interspeech), 2023 Yochai Blau Rohan Agrawal Lior Madmony Gary Wang Andrew Rosenberg Zhehuai Chen Zorik Gekhman Genady Beryozkin Parisa Haghani Bhuvana Ramabhadran 104 3 0 14 Aug 2023
Improving Joint Speech-Text Representations Without AlignmentInterspeech (Interspeech), 2023 Cal Peyser Zhong Meng Ke Hu Rohit Prabhavalkar Andrew Rosenberg Tara N. Sainath M. Picheny Dong Wang VLM 185 4 0 11 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan V. Srivastav Tong Yu Joël L. Lavanchy J. Marescaux Pietro Mascagni Nassir Navab N. Padoy 551 44 0 27 Jul 2023
AudioPaLM: A Large Language Model That Can Speak and Listen Paul Kishan Rubenstein Chulayuth Asawaroengchai D. Nguyen Ankur Bapna Zalan Borsos ... Neil Zeghidour Yu Zhang Zhishuai Zhang Lukás Zilka Christian Frank LM&MA AuLLM VLM 225 379 0 22 Jun 2023
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding Mingqiu Wang Izhak Shafran H. Soltau Wei Han Yuan Cao Dian Yu Laurent El Shafey RALM AuLLM 170 9 0 08 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in TransducerInterspeech (Interspeech), 2023 Lu Huang Yangqiu Song Jun Zhang Lu Lu Zejun Ma 195 4 0 07 Jun 2023
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distributionConference on Uncertainty in Artificial Intelligence (UAI), 2023 Matías P. Pizarro D. Kolossa Asja Fischer AAML 308 1 0 26 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text TranslationNeural Information Processing Systems (NeurIPS), 2023 Chenyang Le Yao Qian Long Zhou Shujie Liu Yanmin Qian Michael Zeng Xuedong Huang 210 18 0 24 May 2023
CMOT: Cross-modal Mixup via Optimal Transport for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Yan Zhou Qingkai Fang Yang Feng OT 292 40 0 24 May 2023
Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023 Vineel Pratap Andros Tjandra Bowen Shi Paden Tomasello Arun Babu ... Yossi Adi Xiaohui Zhang Wei-Ning Hsu Alexis Conneau Michael Auli VLM 296 499 0 22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities Peng Wang Shijie Wang Junyang Lin Shuai Bai Xiaohuan Zhou Jingren Zhou Xinggang Wang Chang Zhou VLM MLLM ObjD 371 149 0 18 May 2023
Towards Speech Dialogue Translation Mediating Speakers of Different LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Shuichiro Shimizu Chenhui Chu Sheng Li Sadao Kurohashi Kyoto University 84 3 0 16 May 2023
Back Translation for Speech-to-text Translation Without TranscriptsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Qingkai Fang Yang Feng 156 16 0 15 May 2023
Understanding and Bridging the Modality Gap for Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Qingkai Fang Yang Feng 209 29 0 15 May 2023
Masked Audio Text Encoders are Effective Multi-Modal RescorersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Jason (Jinglun) Cai Monica Sunkara Xilai Li Anshu Bhatia Xiao Pan S. Bodapati 284 5 0 11 May 2023