ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.02737
  4. Cited By
Advances in Joint CTC-Attention based End-to-End Speech Recognition with
  a Deep CNN Encoder and RNN-LM

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Interspeech (Interspeech), 2017
8 June 2017
Takaaki Hori
Shinji Watanabe
Yu Zhang
William Chan
ArXiv (abs)PDFHTML

Papers citing "Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM"

50 / 124 papers shown
Title
Unified Learnable 2D Convolutional Feature Extraction for ASR
Unified Learnable 2D Convolutional Feature Extraction for ASR
Peter Vieting
Benedikt Hilmes
Ralf Schluter
Hermann Ney
SSL
129
0
0
12 Sep 2025
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding
Duc Cao-Dinh
Khai Le-Duc
Anh Dao
Bach Phan Tat
Chris Ngo
Duy M. H. Nguyen
Nguyen X. Khanh
Thanh Nguyen-Tang
177
0
0
01 Jul 2025
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Takaaki Hori
Martin Kocour
Adnan Haider
Erik McDermott
Xiaodan Zhuang
AuLLM
132
5
0
17 Jan 2025
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time DimensionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
245
1
0
01 Oct 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State
  Spaces Models
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces ModelsSpoken Language Technology Workshop (SLT), 2024
Xiaoxue Gao
Nancy F. Chen
Mamba
177
10
0
27 Sep 2024
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with
  Multi-Pass Augmented Generative Error Correction
Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction
Yuka Ko
Sheng Li
Chao-Han Huck Yang
Tatsuya Kawahara
AuLLM
141
5
0
29 Aug 2024
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Contextualized Automatic Speech Recognition with Dynamic Vocabulary
Yui Sudo
Yosuke Fukumoto
Muhammad Shakeel
Yifan Peng
Shinji Watanabe
251
7
0
22 May 2024
Speaker Characterization by means of Attention Pooling
Speaker Characterization by means of Attention Pooling
Federico Costa
Miquel India
Javier Hernando
175
2
0
07 May 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
271
6
0
22 Dec 2023
Iterative Shallow Fusion of Backward Language Model for End-to-End
  Speech Recognition
Iterative Shallow Fusion of Backward Language Model for End-to-End Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
A. Ogawa
Takafumi Moriya
Naoyuki Kamo
Naohiro Tawara
Marc Delcroix
128
3
0
17 Oct 2023
Dementia Assessment Using Mandarin Speech with an Attention-based Speech
  Recognition Encoder
Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition EncoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zih-Jyun Lin
Yi-Ju Chen
P. Kuo
Likai Huang
Chaur-Jong Hu
Cheng-Yu Chen
103
2
0
06 Oct 2023
Chunked Attention-based Encoder-Decoder Model for Streaming Speech
  Recognition
Chunked Attention-based Encoder-Decoder Model for Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Mohammad Zeineldeen
Albert Zeyer
Ralf Schluter
Hermann Ney
AuLLM
273
9
0
15 Sep 2023
Language-Routing Mixture of Experts for Multilingual and Code-Switching
  Speech Recognition
Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech RecognitionInterspeech (Interspeech), 2023
Wenxuan Wang
Guodong Ma
Yuke Li
Binbin Du
MoE
219
39
0
12 Jul 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
256
239
0
03 Mar 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech
  Recognition
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
BDLUQCV
112
0
0
16 Jan 2023
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving
  Electrolaryngeal Speech Recognition
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
Tomoki Toda
165
10
0
02 Nov 2022
Linguistic-Enhanced Transformer with CTC Embedding for Speech
  Recognition
Linguistic-Enhanced Transformer with CTC Embedding for Speech RecognitionInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Mengyuan Zhao
Zhiyong Zhang
Jing Xiao
90
1
0
25 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
140
15
0
13 Oct 2022
A Universally-Deployable ASR Frontend for Joint Acoustic Echo
  Cancellation, Speech Enhancement, and Voice Separation
A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice SeparationInterspeech (Interspeech), 2022
Tom O'Malley
A. Narayanan
Quan Wang
139
5
0
14 Sep 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech
  Self-Supervised Learning
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Yeonghyeon Lee
Kangwook Jang
Jahyun Goo
Youngmoon Jung
Hoi-Rim Kim
228
39
0
01 Jul 2022
On Comparison of Encoders for Attention based End to End Speech
  Recognition in Standalone and Rescoring Mode
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring ModeInternational Conference on Signal Processing and Communications (ICSPC), 2022
Raviraj Joshi
Subodh Kumar
107
2
0
26 Jun 2022
Streaming Noise Context Aware Enhancement For Automatic Speech
  Recognition in Multi-Talker Environments
Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker EnvironmentsInternational Workshop on Acoustic Signal Enhancement (IWAENC), 2022
Joseph Peter Caroselli
A. Narayanan
Yiteng Huang
73
1
0
17 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio CaptioningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
245
19
0
11 May 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech
  recognition
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognitionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Zhao You
Shulin Feng
Jane Polak Scowcroft
Dong Yu
145
10
0
07 Apr 2022
Investigating Self-supervised Pretraining Frameworks for Pathological
  Speech Recognition
Investigating Self-supervised Pretraining Frameworks for Pathological Speech RecognitionInterspeech (Interspeech), 2022
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
221
44
0
29 Mar 2022
Joint Speech Recognition and Audio Captioning
Joint Speech Recognition and Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
106
10
0
03 Feb 2022
Decoupling Visual-Semantic Feature Learning for Robust Scene Text
  Recognition
Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition
Changxu Cheng
Bohan Li
Qi Zheng
Yongpan Wang
Wenyu Liu
87
2
0
24 Nov 2021
A comparison of streaming models and data augmentation methods for
  robust speech recognition
A comparison of streaming models and data augmentation methods for robust speech recognitionAutomatic Speech Recognition & Understanding (ASRU), 2021
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
119
6
0
19 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation,
  Speech Enhancement and Speech Separation
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley
A. Narayanan
Quan Wang
Alex Park
James Walker
N. Howard
99
34
0
18 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
382
424
0
02 Nov 2021
Sequence Transduction with Graph-based Supervision
Sequence Transduction with Graph-based SupervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
210
7
0
01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and Recognition
SNRi Target Training for Joint Speech Enhancement and RecognitionInterspeech (Interspeech), 2021
Yuma Koizumi
Shigeki Karita
A. Narayanan
S. Panchapagesan
M. Bacchiani
222
16
0
01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASRAutomatic Speech Recognition & Understanding (ASRU), 2021
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
176
16
0
30 Oct 2021
Self-Attention Channel Combinator Frontend for End-to-End Multichannel
  Far-field Speech Recognition
Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech RecognitionInterspeech (Interspeech), 2021
Rong Gong
Carl Quillen
D. Sharma
Andrew Goderre
José Laínez
Ljubomir Milanović
175
15
0
10 Sep 2021
Investigations on Speech Recognition Systems for Low-Resource Dialectal
  Arabic-English Code-Switching Speech
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching SpeechComputer Speech and Language (CSL), 2021
Injy Hamed
Pavel Denisov
C. Li
Mohamed S. Elmahdy
Slim Abdennadher
Ngoc Thang Vu
175
37
0
29 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
176
12
0
24 Aug 2021
Modality Fusion Network and Personalized Attention in Momentary Stress
  Detection in the Wild
Modality Fusion Network and Personalized Attention in Momentary Stress Detection in the WildAffective Computing and Intelligent Interaction (ACII), 2021
Han Yu
T. Vaessen
I. Myin‐Germeys
Akane Sano
169
15
0
19 Jul 2021
A Comparative Study on Neural Architectures and Training Methods for
  Japanese Speech Recognition
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech RecognitionInterspeech (Interspeech), 2021
Shigeki Karita
Yotaro Kubo
M. Bacchiani
Llion Jones
93
13
0
09 Jun 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
171
35
0
02 May 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded TransformersInterspeech (Interspeech), 2021
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
136
37
0
19 Apr 2021
WNARS: WFST based Non-autoregressive Streaming End-to-End Speech
  Recognition
WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition
Zhichao Wang
Wenwen Yang
Pan Zhou
Wei Chen
RALM
122
18
0
08 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Capturing Multi-Resolution Context by Dilated Self-AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Jonathan Le Roux
130
8
0
07 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
308
249
0
31 Mar 2021
Pre-training for low resource speech-to-intent applications
Pre-training for low resource speech-to-intent applications
Pu Wang
Hugo Van hamme
101
4
0
30 Mar 2021
SubSpectral Normalization for Neural Audio Data Processing
SubSpectral Normalization for Neural Audio Data ProcessingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Simyung Chang
Hyoungwoo Park
Janghoon Cho
Hyunsin Park
Sungrack Yun
Kyuwoong Hwang
99
35
0
25 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
  with Self-Knowledge Distillation
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge DistillationInterspeech (Interspeech), 2021
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
166
6
0
17 Mar 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with
  Large-Context Knowledge Distillation
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ryo Masumura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
160
32
0
16 Feb 2021
Thank you for Attention: A survey on Attention-based Artificial Neural
  Networks for Automatic Speech Recognition
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech RecognitionIntelligent Systems with Applications (ISA), 2021
Priyabrata Karmakar
S. Teng
Guojun Lu
118
33
0
14 Feb 2021
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness
  of Multi-Stream End-to-End ASR
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASRSpoken Language Technology Workshop (SLT), 2021
Ruizhi Li
Gregory Sell
H. Hermansky
119
2
0
05 Feb 2021
The 2020 ESPnet update: new features, broadened applications,
  performance improvements, and future plans
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
181
39
0
23 Dec 2020
123
Next