Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.03271
Cited By
v1
v2
v3 (latest)
RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions
7 May 2020
Chung-Cheng Chiu
A. Narayanan
Wei Han
Rohit Prabhavalkar
Yu Zhang
Navdeep Jaitly
Ruoming Pang
Tara N. Sainath
Patrick Nguyen
Liangliang Cao
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions"
32 / 32 papers shown
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Nikolai Lund Kühne
Jesper Jensen
Jan Østergaard
Zheng-Hua Tan
Mamba
427
2
0
01 Jul 2025
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
275
2
0
20 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Neural Information Processing Systems (NeurIPS), 2025
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
438
2
0
06 Feb 2025
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
360
2
0
28 Mar 2024
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenjie Huang
Cyril Allauzen
Tongzhou Chen
Kilol Gupta
Ke Hu
James Qin
Yu Zhang
Yongqiang Wang
Shuo-yiin Chang
Tara N. Sainath
MoMe
307
19
0
23 Jan 2024
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam
Shuo-yiin Chang
Tara N. Sainath
Rohit Prabhavalkar
Quan Wang
Shaan Bijwadia
259
5
0
18 Dec 2023
DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
A. Timofeev
Anastasiia Fadeeva
A. Afonin
C. Musat
Andrii Maksai
413
3
0
29 Nov 2023
Long-form Simultaneous Speech Translation: Thesis Proposal
International Joint Conference on Natural Language Processing (IJCNLP), 2023
Peter Polák
3DV
284
3
0
17 Oct 2023
Updated Corpora and Benchmarks for Long-Form Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jennifer Drexler Fox
Desh Raj
Natalie Delworth
Quinn Mcnamara
Corey Miller
Miguel Jetté
AuLLM
227
12
0
26 Sep 2023
Improving RNN-Transducers with Acoustic LookAhead
Interspeech (Interspeech), 2023
Vinit Unni
Ashish R. Mittal
Preethi Jyothi
Sunita Sarawagi
314
4
0
11 Jul 2023
Efficient Domain Adaptation for Speech Foundation Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yue Liu
DongSeon Hwang
Zhouyuan Huo
Junwen Bai
Guru Prakash
...
K. Sim
Yu Zhang
Wei Han
Trevor Strohman
F. Beaufays
AI4CE
320
30
0
03 Feb 2023
E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Wenjie Huang
Shuo-yiin Chang
Tara N. Sainath
Yanzhang He
David Rybach
R. David
Rohit Prabhavalkar
Cyril Allauzen
Cal Peyser
Trevor Strohman
301
6
0
28 Nov 2022
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Interspeech (Interspeech), 2022
Yist Y. Lin
Tao Han
Haihua Xu
Van Tung Pham
Yerbolat Khassanov
Tze Yuang Chong
Yi He
Lu Lu
Zejun Ma
215
4
0
28 Oct 2022
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
Piyush Behre
N. Parihar
S.S. Tan
A. Shah
Eva Sharma
Geoffrey Liu
Shuangyu Chang
H. Khalil
C. Basoglu
S. Pathak
VLM
293
5
0
26 Oct 2022
Distribution Aware Metrics for Conditional Natural Language Generation
International Conference on Language Resources and Evaluation (LREC), 2022
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
419
5
0
15 Sep 2022
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Zoey Liu
J. Spence
Emily Tucker Prudhommeaux
140
10
0
26 Aug 2022
E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR
Interspeech (Interspeech), 2022
Wenjie Huang
Shuo-yiin Chang
David Rybach
Rohit Prabhavalkar
Tara N. Sainath
Cyril Allauzen
Cal Peyser
Zhiyun Lu
VLM
277
30
0
22 Apr 2022
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jinhan Wang
Xiaosu Tong
Jinxi Guo
Di He
Roland Maas
260
5
0
22 Feb 2022
Scaling ASR Improves Zero and Few Shot Learning
Interspeech (Interspeech), 2021
Alex Xiao
Weiyi Zheng
Gil Keren
Duc Le
Frank Zhang
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
Abdel-rahman Mohamed
256
26
0
10 Nov 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Loren Lugosch
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
VLM
360
35
0
30 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
264
16
0
12 Oct 2021
Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition
Zhiyun Lu
Yanwei Pan
Thibault Doutre
Parisa Haghani
Liangliang Cao
Rohit Prabhavalkar
Chuxu Zhang
Trevor Strohman
AuLLM
334
16
0
08 Oct 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2021
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
265
201
0
27 Sep 2021
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers
Interspeech (Interspeech), 2021
Juntae Kim
Jee-Hye Lee
240
8
0
22 Aug 2021
Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Interspeech (Interspeech), 2021
Thibault Doutre
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Olivier Siohan
Liangliang Cao
207
6
0
25 Apr 2021
Advancing RNN Transducer Technology for Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
G. Saon
Zoltan Tueske
Daniel Bolaños
Brian Kingsbury
291
103
0
17 Mar 2021
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Xuankai Chang
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
RALM
181
16
0
06 Jan 2021
Improving RNN-T ASR Accuracy Using Context Audio
Interspeech (Interspeech), 2020
A. Schwarz
Ilya Sklyar
Simon Wiesler
279
12
0
20 Nov 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
237
34
0
26 Oct 2020
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
Thibault Doutre
Wei Han
Min Ma
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
A. Narayanan
Ananya Misra
Yu Zhang
Liangliang Cao
373
25
0
22 Oct 2020
A New Training Pipeline for an Improved Neural Transducer
Albert Zeyer
André Merboldt
Ralf Schluter
Hermann Ney
AI4TS
MedIm
262
53
0
19 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
435
303
0
07 May 2020
1
Page 1 of 1