v1v2v3 (latest)

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

7 May 2020

Papers citing "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions"

32 / 32 papers shown

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement

427

01 Jul 2025

SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

275

20 Feb 2025

Aligner-Encoders: Self-Attention Transformers Can Be Self-TransducersNeural Information Processing Systems (NeurIPS), 2025

438

06 Feb 2025

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Venkatesh Ravichandran

Shalini Ghosh

360

28 Mar 2024

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive StudyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

307

23 Jan 2024

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Guru Prakash Arumugam

259

18 Dec 2023

DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

413

29 Nov 2023

Long-form Simultaneous Speech Translation: Thesis ProposalInternational Joint Conference on Natural Language Processing (IJCNLP), 2023

Peter Polák

3DV

284

17 Oct 2023

Updated Corpora and Benchmarks for Long-Form Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

227

26 Sep 2023

Improving RNN-Transducers with Acoustic LookAheadInterspeech (Interspeech), 2023

314

11 Jul 2023

Efficient Domain Adaptation for Speech Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

320

03 Feb 2023

E2E Segmentation in a Two-Pass Cascaded Encoder ASR ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

301

28 Nov 2022

Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech RecognitionInterspeech (Interspeech), 2022

Yerbolat Khassanov

215

28 Oct 2022

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

293

26 Oct 2022

Distribution Aware Metrics for Conditional Natural Language GenerationInternational Conference on Language Resources and Evaluation (LREC), 2022

David M. Chan

Yiming Ni

David A. Ross

Sudheendra Vijayanarasimhan

Austin Myers

John F. Canny

419

15 Sep 2022

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Zoey Liu

J. Spence

Emily Tucker Prudhommeaux

140

26 Aug 2022

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASRInterspeech (Interspeech), 2022

277

22 Apr 2022

VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

260

22 Feb 2022

Scaling ASR Improves Zero and Few Shot LearningInterspeech (Interspeech), 2021

Ozlem Kalinli

256

10 Nov 2021

Pseudo-Labeling for Massively Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

360

30 Oct 2021

Multi-Modal Pre-Training for Automated Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

264

12 Oct 2021

Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition

334

08 Oct 2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech RecognitionIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2021

...

265

201

27 Sep 2021

Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention LayersInterspeech (Interspeech), 2021

Juntae Kim

Jee-Hye Lee

240

22 Aug 2021

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T modelsInterspeech (Interspeech), 2021

207

25 Apr 2021

Advancing RNN Transducer Technology for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

291

103

17 Mar 2021

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker RecordingsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

181

06 Jan 2021

Improving RNN-T ASR Accuracy Using Context AudioInterspeech (Interspeech), 2020

A. Schwarz

Ilya Sklyar

Simon Wiesler

279

20 Nov 2020

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network TransducerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

237

26 Oct 2020

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

373

22 Oct 2020

A New Training Pipeline for an Improved Neural Transducer

262

19 May 2020

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

435

303

07 May 2020