Improving the fusion of acoustic and text representations in RNN-T

Improving the fusion of acoustic and text representations in RNN-T

25 January 2022

Chao Zhang

Tara N. Sainath

Shuo-yiin Chang

Papers citing "Improving the fusion of acoustic and text representations in RNN-T"

13 / 13 papers shown

Title
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition Tian-Hao Zhang Dinghao Zhou Guiping Zhong Jiaming Zhou Baoxiang Li 10 3 0 26 Jul 2023
Improving RNN-Transducers with Acoustic LookAhead Vinit Unni Ashish R. Mittal P. Jyothi Sunita Sarawagi 11 2 0 11 Jul 2023
Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition Belen Alastruey Lukas Drude Jahn Heymann Simon Wiesler 12 1 0 12 Jun 2023
Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator Guangzhi Sun C. Zhang P. Woodland 11 4 0 30 May 2023
Diagonal State Space Augmented Transformers for Speech Recognition G. Saon Ankit Gupta Xiaodong Cui AI4TS 22 26 0 27 Feb 2023
UML: A Universal Monolingual Output Layer for Multilingual ASR Chaoyang Zhang Bo-wen Li Tara N. Sainath Trevor Strohman Shuo-yiin Chang 13 7 0 22 Feb 2023
Contextual-Utterance Training for Automatic Speech Recognition Alejandro Gomez-Alanis Lukas Drude A. Schwarz R. Swaminathan Simon Wiesler 11 1 0 27 Oct 2022
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator Guangzhi Sun C. Zhang P. Woodland 14 14 0 18 May 2022
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers Jian Xue Peidong Wang Jinyu Li Matt Post Yashesh Gaur AI4TS 17 26 0 11 Apr 2022
Adaptive Discounting of Implicit Language Models in RNN-Transducers Vinit Unni Shreya Khare Ashish R. Mittal P. Jyothi Sunita Sarawagi Samarth Bharadwaj 12 3 0 21 Feb 2022
Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition Zhong Meng Naoyuki Kanda Yashesh Gaur S. Parthasarathy Eric Sun Liang Lu Xie Chen Jinyu Li Y. Gong AuLLM 19 52 0 02 Feb 2021
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data Thibault Doutre Wei Han Min Ma Zhiyun Lu Chung-Cheng Chiu Ruoming Pang A. Narayanan Ananya Misra Yu Zhang Liangliang Cao 52 22 0 22 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition Yangyang Shi Yongqiang Wang Chunyang Wu Ching-Feng Yeh Julian Chan Frank Zhang Duc Le M. Seltzer 49 168 0 21 Oct 2020