ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,064 papers shown
Hystoc: Obtaining word confidences for fusion of end-to-end ASR systems
Hystoc: Obtaining word confidences for fusion of end-to-end ASR systemsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Karel Beneš
M. Kocour
L. Burget
129
2
0
21 May 2023
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select
  Indic Languages
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Shivam Mhaskar
Vineet Bhat
Akshay Batheja
S. Deoghare
Paramveer Choudhary
P. Bhattacharyya
152
7
0
21 May 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech RecognitionInterspeech (Interspeech), 2023
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
160
19
0
21 May 2023
Contextualized End-to-End Speech Recognition with Contextual Phrase
  Prediction Network
Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction NetworkInterspeech (Interspeech), 2023
Kaixun Huang
Aoting Zhang
Zhanheng Yang
Pengcheng Guo
Bingshen Mu
Tianyi Xu
Linfu Xie
446
38
0
21 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognitionInterspeech (Interspeech), 2023
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
211
3
0
19 May 2023
Blank-regularized CTC for Frame Skipping in Neural Transducer
Blank-regularized CTC for Frame Skipping in Neural TransducerInterspeech (Interspeech), 2023
Yifan Yang
Xiaoyu Yang
Liyong Guo
Zengwei Yao
Wei Kang
Fangjun Kuang
Long Lin
Xie Chen
Daniel Povey
142
11
0
19 May 2023
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide
  for Simultaneous Speech Translation
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech TranslationInterspeech (Interspeech), 2023
Sara Papi
Marco Turchi
Matteo Negri
185
31
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding TasksInterspeech (Interspeech), 2023
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
236
22
0
18 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition ToolkitInterspeech (Interspeech), 2023
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
246
110
0
18 May 2023
A Lexical-aware Non-autoregressive Transformer-based ASR Model
A Lexical-aware Non-autoregressive Transformer-based ASR ModelInterspeech (Interspeech), 2023
Chong Lin
Kuan-Yu Chen
AI4TS
122
3
0
18 May 2023
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
  End-to-End Speech Recognition System
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition SystemInterspeech (Interspeech), 2023
Xian Shi
Haoneng Luo
Zhifu Gao
Shiliang Zhang
Zhijie Yan
151
3
0
18 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Masked Audio Text Encoders are Effective Multi-Modal RescorersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
345
5
0
11 May 2023
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
  for Speech Recognition
Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xuandi Fu
Kanthashree Mysore Sathyendra
Ankur Gandhe
Jing Liu
Grant P. Strimel
Ross McGowan
Athanasios Mouchtaris
292
20
0
09 May 2023
End-to-end spoken language understanding using joint CTC loss and
  self-supervised, pretrained acoustic encoders
End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jixuan Wang
Martin H. Radfar
Kailin Wei
Clement Chung
169
3
0
04 May 2023
Deep Transfer Learning for Automatic Speech Recognition: Towards Better
  Generalization
Deep Transfer Learning for Automatic Speech Recognition: Towards Better GeneralizationKnowledge-Based Systems (KBS), 2023
Hamza Kheddar
Yassine Himeur
S. Al-Maadeed
Abbes Amira
F. Bensaali
298
117
0
27 Apr 2023
Self-regularised Minimum Latency Training for Streaming
  Transformer-based Speech Recognition
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech RecognitionInterspeech (Interspeech), 2022
Mohan Li
R. Doddipatla
Catalin Zorila
228
0
0
24 Apr 2023
Non-autoregressive End-to-end Approaches for Joint Automatic Speech
  Recognition and Spoken Language Understanding
Non-autoregressive End-to-end Approaches for Joint Automatic Speech Recognition and Spoken Language UnderstandingSpoken Language Technology Workshop (SLT), 2023
Mohan Li
R. Doddipatla
169
8
0
21 Apr 2023
DropDim: A Regularization Method for Transformer Networks
DropDim: A Regularization Method for Transformer NetworksIEEE Signal Processing Letters (IEEE SPL), 2023
Hao Zhang
Dan Qu
Kejia Shao
Xu Yang
190
14
0
20 Apr 2023
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
  Scale
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at ScaleIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Cal Peyser
M. Picheny
Dong Wang
Rohit Prabhavalkar
Ronny Huang
Tara N. Sainath
AI4TS
127
2
0
19 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
180
13
0
18 Apr 2023
A CTC Alignment-based Non-autoregressive Transformer for End-to-end
  Automatic Speech Recognition
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Ruchao Fan
Wei Chu
Peng Chang
Abeer Alwan
173
18
0
15 Apr 2023
Robust and Context-Aware Real-Time Collaborative Robot Handling via
  Dynamic Gesture Commands
Robust and Context-Aware Real-Time Collaborative Robot Handling via Dynamic Gesture CommandsIEEE Robotics and Automation Letters (RA-L), 2023
Rui Chen
Alvin C M Shek
Changliu Liu
100
6
0
12 Apr 2023
Online Spatio-Temporal Learning with Target Projection
Online Spatio-Temporal Learning with Target ProjectionInternational Conference on Artificial Intelligence Circuits and Systems (ICAICS), 2023
Thomas Ortner
Lorenzo Pes
Joris Gentinetta
Charlotte Frenkel
A. Pantazi
191
11
0
11 Apr 2023
Sim-T: Simplify the Transformer Network by Multiplexing Technique for
  Speech Recognition
Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition
Guangyong Wei
Zhikui Duan
Shiren Li
Guangguang Yang
Xinmei Yu
Junhua Li
194
5
0
11 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
301
18
0
11 Apr 2023
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in
  Speech Recognition
Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Saumya Yashmohini Sahai
Jing Liu
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Anastasios Alexandridis
...
Ross McGowan
Ariya Rastrow
Feng-Ju Chang
Athanasios Mouchtaris
Siegfried Kunzmann
204
5
0
03 Apr 2023
Dialog act guided contextual adapter for personalized speech recognition
Dialog act guided contextual adapter for personalized speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Feng-Ju Chang
Thejaswi Muniyappa
Kanthashree Mysore Sathyendra
Kailin Wei
Grant P. Strimel
Ross McGowan
124
7
0
31 Mar 2023
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech
  recognition in neural transducers
PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
R. Pandey
Roger Ren
Qi Luo
Jing Liu
Ariya Rastrow
Ankur Gandhe
Denis Filimonov
Grant P. Strimel
A. Stolcke
I. Bulyko
147
15
0
30 Mar 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
199
24
0
29 Mar 2023
Cross-utterance ASR Rescoring with Graph-based Label Propagation
Cross-utterance ASR Rescoring with Graph-based Label PropagationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Srinath Tankasala
Long Chen
A. Stolcke
A. Raju
Qianli Deng
Chander Chandak
Aparna Khare
Roland Maas
Venkatesh Ravichandran
117
2
0
27 Mar 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
210
0
0
23 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yifan Peng
Jaesong Lee
Shinji Watanabe
215
39
0
14 Mar 2023
Probing neural representations of scene perception in a hippocampally
  dependent task using artificial neural networks
Probing neural representations of scene perception in a hippocampally dependent task using artificial neural networksComputer Vision and Pattern Recognition (CVPR), 2023
Markus Frey
Christian F. Doeller
Caswell Barry
169
4
0
11 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and OutlookAPSIPA Transactions on Signal and Information Processing (TASIP), 2023
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
276
53
0
10 Mar 2023
MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems
MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems
A. Huq
Weiyi Zhang
Xiaolin Hu
AAML
174
3
0
10 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
288
245
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
398
347
0
02 Mar 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum TrainingAutomatic Speech Recognition & Understanding (ASRU), 2023
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
276
8
0
01 Mar 2023
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses
  and Constrained Decoding Space
N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding SpaceInterspeech (Interspeech), 2023
Rao Ma
Mark Gales
Kate Knill
Mengjie Qian
251
49
0
01 Mar 2023
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech
  Recognition
MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yoohwan Kwon
Soo-Whan Chung
MoE
176
28
0
27 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Biao Zhang
Barry Haddow
Rico Sennrich
267
3
0
21 Feb 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition
  System to a Multi-Talker One
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
185
21
0
20 Feb 2023
Speaker and Language Change Detection using Wav2vec2 and Whisper
Speaker and Language Change Detection using Wav2vec2 and Whisper
Tijn Berns
Nik Vaessen
David A. van Leeuwen
163
6
0
18 Feb 2023
Confidence Score Based Speaker Adaptation of Conformer Speech
  Recognition Systems
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition SystemsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Jiajun Deng
Xurong Xie
Tianzi Wang
Mingyu Cui
Boyang Xue
Zengrui Jin
Guinan Li
Shujie Hu
Xunying Liu
158
7
0
15 Feb 2023
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech
  Recognizers via Hierarchical Distillation
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical DistillationInterspeech (Interspeech), 2023
Minglun Han
Feilong Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
216
17
0
30 Jan 2023
Achieving Timestamp Prediction While Recognizing with Non-Autoregressive
  End-to-End ASR Model
Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
Xian Shi
Yanni Chen
Shiliang Zhang
Zhijie Yan
159
8
0
29 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
143
19
0
21 Jan 2023
Neural Architecture Search: Insights from 1000 Papers
Neural Architecture Search: Insights from 1000 Papers
Colin White
Mahmoud Safari
R. Sukthanker
Binxin Ru
T. Elsken
Arber Zela
Debadeepta Dey
Katharina Eggensperger
3DVAI4CE
409
192
0
20 Jan 2023
Two Stage Contextual Word Filtering for Context bias in Unified
  Streaming and Non-streaming Transducer
Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming TransducerInterspeech (Interspeech), 2023
Zhanheng Yang
Sining Sun
Xiong Wang
Yike Zhang
Long Ma
Linfu Xie
214
16
0
17 Jan 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech
  Recognition
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
BDLUQCV
125
0
0
16 Jan 2023
Previous
123456...202122
Next