End-to-End Speech Recognition: A Survey

3 March 2023

Papers citing "End-to-End Speech Recognition: A Survey"

29 / 29 papers shown

Title
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper Jiaming Zhou S. Zhao Jiabei He Hui Wang Wenjia Zeng Yong Chen Haoqin Sun Aobo Kong Yong Qin 55 1 0 13 Mar 2025
ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization Haaris Mehmood Karthikeyan P. Saravanan Pablo Peso Parada David Tuckey Mete Ozay Gil Ho Lee Jungin Lee Seokyeong Jung 52 0 0 12 Mar 2025
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking Yiming Li Kaiying Yan Shuo Shao Tongqing Zhai Shu-Tao Xia Z. Qin D. Tao AAML 68 0 0 02 Mar 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation Qiuming Zhao Guangzhi Sun Chao Zhang Mingxing Xu Thomas Fang Zheng MoMe VLM 71 0 0 24 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers Adam Stooke Rohit Prabhavalkar K. Sim P. M. Mengibar 26 0 0 06 Feb 2025
Classification Error Bound for Low Bayes Error Conditions in Machine Learning Zijian Yang Vahe Eminyan Ralf Schluter Hermann Ney 31 0 0 28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Kai-Tuo Xu Feng-Long Xie Xu Tang Yao Hu 66 4 0 24 Jan 2025
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores Jiaming Zhou S. Zhao Hui Wang Tian-Hao Zhang Haoqin Sun Xuechen Wang Yong Qin 148 3 0 20 Jan 2025
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems Amin Robatian Mohammad Hajipour Mohammad Reza Peyghan Fatemeh Rajabi Sajjad Amini Shahrokh Ghaemmaghami Iman Gholampour 41 0 0 18 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 79 2 0 10 Jan 2025
Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality Tina Raissi Christoph Luscher Simon Berger Ralf Schluter Hermann Ney 20 2 0 16 Jul 2024
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech Dena F. Mujtaba N. Mahapatra Megan Arney J Scott Yaruss Hope Gerlach-Houck Caryn Herring Jia Bin 32 0 0 10 May 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition O. Kundacina V. Vincan D. Mišković BDL 91 0 0 03 May 2024
Semi-Autoregressive Streaming ASR With Label Context Siddhant Arora G. Saon Shinji Watanabe Brian Kingsbury AI4TS 13 5 0 19 Sep 2023
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction Jinchuan Tian Jianwei Yu Hangting Chen Brian Yan Chao Weng Dong Yu Shinji Watanabe 14 1 0 19 Aug 2023
DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution Matías P. Pizarro D. Kolossa Asja Fischer AAML 17 1 0 26 May 2023
E-Branchformer: Branchformer with Enhanced merging for speech recognition Kwangyoun Kim Felix Wu Yifan Peng Jing Pan Prashant Sridhar Kyu Jeong Han Shinji Watanabe 47 104 0 30 Sep 2022
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Junyi Ao Rui Wang Long Zhou Chengyi Wang Shuo Ren ... Yu Zhang Zhihua Wei Yao Qian Jinyu Li Furu Wei 110 192 0 14 Oct 2021
CTC Variations Through New WFST Topologies A. Laptev Somshubra Majumdar Boris Ginsburg 24 20 0 06 Oct 2021
MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition Linghui Meng Jin Xu Xu Tan Jindong Wang Tao Qin Bo Xu VLM 62 75 0 25 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition Jaesong Lee Shinji Watanabe 111 135 0 05 Feb 2021
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging Rohit Prabhavalkar Yanzhang He David Rybach S. Campbell A. Narayanan Trevor Strohman Tara N. Sainath 41 35 0 12 Dec 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition Yangyang Shi Yongqiang Wang Chunyang Wu Ching-Feng Yeh Julian Chan Frank Zhang Duc Le M. Seltzer 49 168 0 21 Oct 2020
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition Chao Weng Chengzhu Yu Jia Cui Chunlei Zhang Dong Yu 67 39 0 28 Nov 2019
Listening while Speaking: Speech Chain by Deep Learning Andros Tjandra S. Sakti Satoshi Nakamura AuLLM 115 165 0 16 Jul 2017
Six Challenges for Neural Machine Translation Philipp Koehn Rebecca Knowles AAML AIMat 208 1,202 0 12 Jun 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,724 0 26 Sep 2016
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 247 9,042 0 06 Jun 2015
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 243 7,597 0 03 Jul 2012