Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Interspeech (Interspeech), 2017

8 June 2017

Papers citing "Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM"

50 / 124 papers shown

Title
Unified Learnable 2D Convolutional Feature Extraction for ASR Peter Vieting Benedikt Hilmes Ralf Schluter Hermann Ney SSL 129 0 0 12 Sep 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding Duc Cao-Dinh Khai Le-Duc Anh Dao Bach Phan Tat Chris Ngo Duy M. H. Nguyen Nguyen X. Khanh Thanh Nguyen-Tang 177 0 0 01 Jul 2025
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025 Takaaki Hori Martin Kocour Adnan Haider Erik McDermott Xiaodan Zhuang AuLLM 132 5 0 17 Jan 2025
The Conformer Encoder May Reverse the Time DimensionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Robin Schmitt Albert Zeyer Mohammad Zeineldeen Ralf Schluter Hermann Ney 245 1 0 01 Oct 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces ModelsSpoken Language Technology Workshop (SLT), 2024 Xiaoxue Gao Nancy F. Chen Mamba 177 10 0 27 Sep 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction Yuka Ko Sheng Li Chao-Han Huck Yang Tatsuya Kawahara AuLLM 141 5 0 29 Aug 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Yui Sudo Yosuke Fukumoto Muhammad Shakeel Yifan Peng Shinji Watanabe 251 7 0 22 May 2024
Speaker Characterization by means of Attention Pooling Federico Costa Miquel India Javier Hernando 175 2 0 07 May 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition A. Ogawa Naohiro Tawara Takatomo Kano Marc Delcroix 271 6 0 22 Dec 2023
Iterative Shallow Fusion of Backward Language Model for End-to-End Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 A. Ogawa Takafumi Moriya Naoyuki Kamo Naohiro Tawara Marc Delcroix 128 3 0 17 Oct 2023
Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition EncoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Zih-Jyun Lin Yi-Ju Chen P. Kuo Likai Huang Chaur-Jong Hu Cheng-Yu Chen 103 2 0 06 Oct 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Mohammad Zeineldeen Albert Zeyer Ralf Schluter Hermann Ney AuLLM 273 9 0 15 Sep 2023
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech RecognitionInterspeech (Interspeech), 2023 Wenxuan Wang Guodong Ma Yuke Li Binbin Du MoE 219 39 0 12 Jul 2023
End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Rohit Prabhavalkar Takaaki Hori Tara N. Sainath Ralf Schluter Shinji Watanabe VLM 256 239 0 03 Mar 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition Will Rieger BDL UQCV 112 0 0 16 Jan 2023
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Lester Phillip Violeta D. Ma Wen-Chin Huang Tomoki Toda 165 10 0 02 Nov 2022
Linguistic-Enhanced Transformer with CTC Embedding for Speech RecognitionInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022 Xulong Zhang Jianzong Wang Ning Cheng Mengyuan Zhao Zhiyong Zhang Jing Xiao 90 1 0 25 Oct 2022
On Compressing Sequences for Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022 Yen Meng Hsuan-Jui Chen Jiatong Shi Shinji Watanabe Paola García Hung-yi Lee Hao Tang SSL 140 15 0 13 Oct 2022
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice SeparationInterspeech (Interspeech), 2022 Tom O'Malley A. Narayanan Quan Wang 139 5 0 14 Sep 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning Yeonghyeon Lee Kangwook Jang Jahyun Goo Youngmoon Jung Hoi-Rim Kim 228 39 0 01 Jul 2022
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring ModeInternational Conference on Signal Processing and Communications (ICSPC), 2022 Raviraj Joshi Subodh Kumar 107 2 0 26 Jun 2022
Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker EnvironmentsInternational Workshop on Acoustic Signal Enhancement (IWAENC), 2022 Joseph Peter Caroselli A. Narayanan Yiteng Huang 73 1 0 17 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Xuenan Xu Zeyu Xie Mengyue Wu K. Yu 245 19 0 11 May 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognitionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022 Zhao You Shulin Feng Jane Polak Scowcroft Dong Yu 145 10 0 07 Apr 2022
Investigating Self-supervised Pretraining Frameworks for Pathological Speech RecognitionInterspeech (Interspeech), 2022 Lester Phillip Violeta Wen-Chin Huang Tomoki Toda 221 44 0 29 Mar 2022
Joint Speech Recognition and Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Chaitanya Narisetty E. Tsunoo Xuankai Chang Yosuke Kashiwagi Michael Hentschel Shinji Watanabe 106 10 0 03 Feb 2022
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition Changxu Cheng Bohan Li Qi Zheng Yongpan Wang Wenyu Liu 87 2 0 24 Nov 2021
A comparison of streaming models and data augmentation methods for robust speech recognitionAutomatic Speech Recognition & Understanding (ASRU), 2021 Jiyeon Kim Mehul Kumar Dhananjaya N. Gowda Abhinav Garg Chanwoo Kim 119 6 0 19 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation Tom O'Malley A. Narayanan Quan Wang Alex Park James Walker N. Howard 99 34 0 18 Nov 2021
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021 Jinyu Li VLM 382 424 0 02 Nov 2021
Sequence Transduction with Graph-based SupervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Niko Moritz Takaaki Hori Shinji Watanabe Jonathan Le Roux 210 7 0 01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and RecognitionInterspeech (Interspeech), 2021 Yuma Koizumi Shigeki Karita A. Narayanan S. Panchapagesan M. Bacchiani 222 16 0 01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for ASRAutomatic Speech Recognition & Understanding (ASRU), 2021 A. Narayanan Chung-Cheng Chiu Tom O'Malley Quan Wang Yanzhang He 176 16 0 30 Oct 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech RecognitionInterspeech (Interspeech), 2021 Rong Gong Carl Quillen D. Sharma Andrew Goderre José Laínez Ljubomir Milanović 175 15 0 10 Sep 2021
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching SpeechComputer Speech and Language (CSL), 2021 Injy Hamed Pavel Denisov C. Li Mohamed S. Elmahdy Slim Abdennadher Ngoc Thang Vu 175 37 0 29 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation Samuel Cahyawijaya 176 12 0 24 Aug 2021
Modality Fusion Network and Personalized Attention in Momentary Stress Detection in the WildAffective Computing and Intelligent Interaction (ACII), 2021 Han Yu T. Vaessen I. Myin‐Germeys Akane Sano 169 15 0 19 Jul 2021
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech RecognitionInterspeech (Interspeech), 2021 Shigeki Karita Yotaro Kubo M. Bacchiani Llion Jones 93 13 0 09 Jun 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021 Siddharth Dalmia Brian Yan Vikas Raunak Florian Metze Shinji Watanabe 171 35 0 02 May 2021
Advanced Long-context End-to-end Speech Recognition Using Context-expanded TransformersInterspeech (Interspeech), 2021 Takaaki Hori Niko Moritz Chiori Hori Jonathan Le Roux 136 37 0 19 Apr 2021
WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition Zhichao Wang Wenwen Yang Pan Zhou Wei Chen RALM 122 18 0 08 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Niko Moritz Takaaki Hori Jonathan Le Roux 130 8 0 07 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021 Alana de Santana Correia Esther Luna Colombini HAI 308 249 0 31 Mar 2021
Pre-training for low resource speech-to-intent applications Pu Wang Hugo Van hamme 101 4 0 30 Mar 2021
SubSpectral Normalization for Neural Audio Data ProcessingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Simyung Chang Hyoungwoo Park Janghoon Cho Hyunsin Park Sungrack Yun Kyuwoong Hwang 99 35 0 25 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge DistillationInterspeech (Interspeech), 2021 Md. Akmal Haidar Chao Xing Mehdi Rezagholizadeh 166 6 0 17 Mar 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Ryo Masumura Naoki Makishima Mana Ihori Akihiko Takashima Tomohiro Tanaka Shota Orihashi 160 32 0 16 Feb 2021
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech RecognitionIntelligent Systems with Applications (ISA), 2021 Priyabrata Karmakar S. Teng Guojun Lu 118 33 0 14 Feb 2021
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASRSpoken Language Technology Workshop (SLT), 2021 Ruizhi Li Gregory Sell H. Hermansky 119 2 0 05 Feb 2021
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans Shinji Watanabe Florian Boyer Xuankai Chang Pengcheng Guo Tomoki Hayashi ... Shigeki Karita Chenda Li Jing Shi Aswin Shanmugam Subramanian Wangyou Zhang VLM 181 39 0 23 Dec 2020