Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.03409
Cited By
v1
v2 (latest)
MAESTRO: Matched Speech Text Representations through Modality Matching
Interspeech (Interspeech), 2022
7 April 2022
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MAESTRO: Matched Speech Text Representations through Modality Matching"
26 / 76 papers shown
Title
Understanding Shared Speech-Text Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Gary Wang
Kyle Kastner
Ankur Bapna
Zhehuai Chen
Andrew Rosenberg
Bhuvana Ramabhadran
Yu Zhang
AuLLM
122
7
0
27 Apr 2023
Code-Switching Text Generation and Injection in Mandarin-English ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Haibin Yu
Yuxuan Hu
Yao Qian
Ma Jin
Linquan Liu
Shujie Liu
Yu Shi
Y. Qian
Ed Lin
Michael Zeng
147
17
0
20 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
240
233
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
318
342
0
02 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Interspeech (Interspeech), 2023
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
204
17
0
27 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ke Hu
Tara N. Sainath
Yue Liu
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Zhiwen Chen
Trevor Strohman
151
16
0
17 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Yue Liu
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLM
VLM
195
12
0
16 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Automatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
297
43
0
10 Feb 2023
Efficient Domain Adaptation for Speech Foundation Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yue Liu
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
224
30
0
03 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
International Conference on Machine Learning (ICML), 2023
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
292
30
0
27 Jan 2023
Speech Aware Dialog System Technology Challenge (DSTC11)
H. Soltau
Izhak Shafran
Mingqiu Wang
Abhinav Rastogi
Jeffrey Zhao
Ye Jia
Wei Han
Yuan Cao
Aramys Miranda
139
11
0
16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
222
23
0
16 Dec 2022
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
245
3
0
06 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
876
5,475
0
06 Dec 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
161
5
0
24 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
IEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
211
48
0
21 Nov 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers
Interspeech (Interspeech), 2022
Peidong Wang
Eric Sun
Jian Xue
Yu-Huan Wu
Long Zhou
Yashesh Gaur
Shujie Liu
Jinyu Li
312
10
0
05 Nov 2022
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
164
7
0
01 Nov 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
167
8
0
30 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
208
22
0
27 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Spoken Language Technology Workshop (SLT), 2022
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
202
17
0
18 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Spoken Language Technology Workshop (SLT), 2022
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Yue Liu
Weiran Wang
Trevor Strohman
RALM
AuLLM
157
36
0
13 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
227
61
0
07 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
241
64
0
30 Sep 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
262
108
0
09 May 2022
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models
J. Yoon
H. Kim
Hyeon Seung Lee
Sunghwan Ahn
N. Kim
391
1
0
05 Nov 2021
Previous
1
2