Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1706.02737
Cited By
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
Interspeech (Interspeech), 2017
8 June 2017
Takaaki Hori
Shinji Watanabe
Yu Zhang
William Chan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM"
50 / 124 papers shown
Title
Unified Learnable 2D Convolutional Feature Extraction for ASR
Peter Vieting
Benedikt Hilmes
Ralf Schluter
Hermann Ney
SSL
129
0
0
12 Sep 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
177
0
0
01 Jul 2025
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Takaaki Hori
Martin Kocour
Adnan Haider
Erik McDermott
Xiaodan Zhuang
AuLLM
132
5
0
17 Jan 2025
The Conformer Encoder May Reverse the Time Dimension
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
245
1
0
01 Oct 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Spoken Language Technology Workshop (SLT), 2024
Xiaoxue Gao
Nancy F. Chen
Mamba
177
10
0
27 Sep 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
Yuka Ko
Sheng Li
Chao-Han Huck Yang
Tatsuya Kawahara
AuLLM
141
5
0
29 Aug 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yosuke Fukumoto
Muhammad Shakeel
Yifan Peng
Shinji Watanabe
251
7
0
22 May 2024
Speaker Characterization by means of Attention Pooling
Federico Costa
Miquel India
Javier Hernando
175
2
0
07 May 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
271
6
0
22 Dec 2023
Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Ogawa
Takafumi Moriya
Naoyuki Kamo
Naohiro Tawara
Marc Delcroix
128
3
0
17 Oct 2023
Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zih-Jyun Lin
Yi-Ju Chen
P. Kuo
Likai Huang
Chaur-Jong Hu
Cheng-Yu Chen
103
2
0
06 Oct 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Mohammad Zeineldeen
Albert Zeyer
Ralf Schluter
Hermann Ney
AuLLM
273
9
0
15 Sep 2023
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition
Interspeech (Interspeech), 2023
Wenxuan Wang
Guodong Ma
Yuke Li
Binbin Du
MoE
219
39
0
12 Jul 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
256
239
0
03 Mar 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
BDL
UQCV
112
0
0
16 Jan 2023
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
Tomoki Toda
165
10
0
02 Nov 2022
Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition
International Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Mengyuan Zhao
Zhiyong Zhang
Jing Xiao
90
1
0
25 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
Spoken Language Technology Workshop (SLT), 2022
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
140
15
0
13 Oct 2022
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Interspeech (Interspeech), 2022
Tom O'Malley
A. Narayanan
Quan Wang
139
5
0
14 Sep 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Yeonghyeon Lee
Kangwook Jang
Jahyun Goo
Youngmoon Jung
Hoi-Rim Kim
228
39
0
01 Jul 2022
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
International Conference on Signal Processing and Communications (ICSPC), 2022
Raviraj Joshi
Subodh Kumar
107
2
0
26 Jun 2022
Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments
International Workshop on Acoustic Signal Enhancement (IWAENC), 2022
Joseph Peter Caroselli
A. Narayanan
Yiteng Huang
73
1
0
17 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
245
19
0
11 May 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Zhao You
Shulin Feng
Jane Polak Scowcroft
Dong Yu
145
10
0
07 Apr 2022
Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition
Interspeech (Interspeech), 2022
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
221
44
0
29 Mar 2022
Joint Speech Recognition and Audio Captioning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
106
10
0
03 Feb 2022
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
Changxu Cheng
Bohan Li
Qi Zheng
Yongpan Wang
Wenyu Liu
87
2
0
24 Nov 2021
A comparison of streaming models and data augmentation methods for robust speech recognition
Automatic Speech Recognition & Understanding (ASRU), 2021
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
119
6
0
19 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley
A. Narayanan
Quan Wang
Alex Park
James Walker
N. Howard
99
34
0
18 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
APSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
382
424
0
02 Nov 2021
Sequence Transduction with Graph-based Supervision
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
210
7
0
01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and Recognition
Interspeech (Interspeech), 2021
Yuma Koizumi
Shigeki Karita
A. Narayanan
S. Panchapagesan
M. Bacchiani
222
16
0
01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for ASR
Automatic Speech Recognition & Understanding (ASRU), 2021
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
176
16
0
30 Oct 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
Interspeech (Interspeech), 2021
Rong Gong
Carl Quillen
D. Sharma
Andrew Goderre
José Laínez
Ljubomir Milanović
175
15
0
10 Sep 2021
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech
Computer Speech and Language (CSL), 2021
Injy Hamed
Pavel Denisov
C. Li
Mohamed S. Elmahdy
Slim Abdennadher
Ngoc Thang Vu
175
37
0
29 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
176
12
0
24 Aug 2021
Modality Fusion Network and Personalized Attention in Momentary Stress Detection in the Wild
Affective Computing and Intelligent Interaction (ACII), 2021
Han Yu
T. Vaessen
I. Myin‐Germeys
Akane Sano
169
15
0
19 Jul 2021
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition
Interspeech (Interspeech), 2021
Shigeki Karita
Yotaro Kubo
M. Bacchiani
Llion Jones
93
13
0
09 Jun 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
171
35
0
02 May 2021
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Interspeech (Interspeech), 2021
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
136
37
0
19 Apr 2021
WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition
Zhichao Wang
Wenwen Yang
Pan Zhou
Wei Chen
RALM
122
18
0
08 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Jonathan Le Roux
130
8
0
07 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Artificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
308
249
0
31 Mar 2021
Pre-training for low resource speech-to-intent applications
Pu Wang
Hugo Van hamme
101
4
0
30 Mar 2021
SubSpectral Normalization for Neural Audio Data Processing
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Simyung Chang
Hyoungwoo Park
Janghoon Cho
Hyunsin Park
Sungrack Yun
Kyuwoong Hwang
99
35
0
25 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Interspeech (Interspeech), 2021
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
166
6
0
17 Mar 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ryo Masumura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
160
32
0
16 Feb 2021
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition
Intelligent Systems with Applications (ISA), 2021
Priyabrata Karmakar
S. Teng
Guojun Lu
118
33
0
14 Feb 2021
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR
Spoken Language Technology Workshop (SLT), 2021
Ruizhi Li
Gregory Sell
H. Hermansky
119
2
0
05 Feb 2021
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
181
39
0
23 Dec 2020
1
2
3
Next