ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,064 papers shown
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Yangui Fang
Baixu Cheng
Jing Peng
Xu Li
Yu Xi
Chengwei Zhang
Guohui Zhong
320
5
0
24 Dec 2025
WST: Weakly Supervised Transducer for Automatic Speech Recognition
WST: Weakly Supervised Transducer for Automatic Speech Recognition
Dongji Gao
Chenda Liao
Changliang Liu
Matthew Wiesner
Leibny Paola García
Daniel Povey
Sanjeev Khudanpur
Jian Wu
160
0
0
06 Nov 2025
A Neural Model for Contextual Biasing Score Learning and Filtering
A Neural Model for Contextual Biasing Score Learning and Filtering
Wanting Huang
Weiran Wang
106
0
0
27 Oct 2025
StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
Qianheng Xu
142
0
0
21 Oct 2025
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
G. Abati
J. C. V. Soares
Giulio Turrisi
Victor Barasuol
Claudio Semini
121
0
0
16 Oct 2025
End-to-end Speech Recognition with similar length speech and text
End-to-end Speech Recognition with similar length speech and text
Peng Fan
Wenping Wang
Fei Deng
100
0
0
12 Oct 2025
Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
Md. Nayeem
Md Shamse Tabrej
Kabbojit Jit Deb
Shaonti Goswami
Md. Azizul Hakim
AI4TSVLM
125
3
0
11 Oct 2025
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
M. Kocour
Martin Karafiát
Alexander Polok
Dominik Klement
L. Burget
Jan ''Honza'' Cernocký
117
0
0
04 Oct 2025
Building Tailored Speech Recognizers for Japanese Speaking Assessment
Building Tailored Speech Recognizers for Japanese Speaking Assessment
Yotaro Kubo
R. Sproat
Chihiro Taguchi
Llion Jones
97
0
0
25 Sep 2025
WolBanking77: Wolof Banking Speech Intent Classification Dataset
WolBanking77: Wolof Banking Speech Intent Classification Dataset
Abdou Karim Kandji
Frédéric Precioso
Cheikh Ba
Samba Ndiaye
Augustin Ndione
215
0
0
23 Sep 2025
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
Ying Fang
Xiaofei Li
115
0
0
18 Sep 2025
Whisper Has an Internal Word Aligner
Whisper Has an Internal Word Aligner
Sung-Lin Yeh
Yen Meng
Hao Tang
120
0
0
12 Sep 2025
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling
Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling
Neil Zeghidour
Eugene Kharitonov
Manu Orsini
Václav Volhejn
Gabriel de Marmiesse
Edouard Grave
P. Pérez
Laurent Mazaré
Alexandre Défossez
OffRL
227
6
0
10 Sep 2025
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint ModelingIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Yue Gu
Zhihao Du
Ying Shi
Shiliang Zhang
Qian Chen
Jiqing Han
110
0
0
07 Sep 2025
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
Hao Shi
Yusuke Fujita
Tomoya Mizumoto
Lianbo Liu
Atsushi Kojima
Yui Sudo
102
1
0
01 Sep 2025
H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
Huangyu Dai
Lingtao Mao
Ben Chen
Zihan Wang
Zihan Liang
Ying Han
Chenyi Lei
Han Li
KELM
121
0
0
22 Aug 2025
A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models
A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models
Noureldin Bayoumi
Robin Schmitt
Tina Raissi
Albert Zeyer
Ralf Schluter
Hermann Ney
84
0
0
13 Aug 2025
TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree
A. Andrusenko
Vladimir Bataev
Lilit Grigoryan
Vitaly Lavrukhin
Boris Ginsburg
177
1
0
09 Aug 2025
Efficient Scaling for LLM-based ASR
Efficient Scaling for LLM-based ASR
Bingshen Mu
Yiwen Shao
Kun Wei
Dong Yu
Lei Xie
AuLLM
193
5
0
06 Aug 2025
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge
Miaomiao Gao
Xiaoxiao Xiang
Yiwen Guo
AILaw
162
1
0
23 Jul 2025
Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition
Supporting SENCOTEN Language Documentation Efforts with Automatic Speech Recognition
Mengzhe Geng
Patrick Littell
Aidan Pine
PENÁĆ
Marc Tessier
Roland Kuhn
143
0
0
14 Jul 2025
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Bingshen Mu
Kun Wei
Pengcheng Guo
Lei Xie
297
7
0
12 Jul 2025
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
119
2
0
09 Jun 2025
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Yu Nakagome
Michael Hentschel
232
0
0
02 Jun 2025
PMF-CEC: Phoneme-augmented Multimodal Fusion for Context-aware ASR Error Correction with Error-specific Selective Decoding
PMF-CEC: Phoneme-augmented Multimodal Fusion for Context-aware ASR Error Correction with Error-specific Selective DecodingIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jiajun He
Tomoki Toda
148
3
0
31 May 2025
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
Zhennan Lin
Kaixun Huang
Wei Ren
Linju Yang
Lei Xie
AI4CE
214
0
0
29 May 2025
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
Xugang Lu
Peng Shen
Yu Tsao
Hisashi Kawai
OT
317
0
0
19 May 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
382
0
0
09 Apr 2025
A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network
A 71.2-μμμW Speech Recognition Accelerator with Recurrent Spiking Neural NetworkIEEE Transactions on Circuits and Systems Part 1: Regular Papers (TCAS-I), 2024
Chih-Chyau Yang
Tian-Sheuan Chang
358
2
0
27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
306
7
0
26 Mar 2025
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
Aniket Abhishek Soni
145
2
0
26 Mar 2025
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
274
1
0
19 Mar 2025
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Michael McGuire
212
1
0
10 Mar 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
325
8
0
07 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading LearningInterspeech (Interspeech), 2024
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
233
7
0
06 Mar 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context MaskingInterspeech (Interspeech), 2024
Khanh Le
Duc Thanh Chau
AI4TS
287
2
0
24 Feb 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
262
3
0
24 Feb 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
MoMeVLM
1.0K
4
0
24 Feb 2025
Note-Level Singing Melody Transcription for Time-Aligned Musical Score Generation
Note-Level Singing Melody Transcription for Time-Aligned Musical Score GenerationIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Leekyung Kim
Sungwook Jeon
Wan Heo
Jonghun Park
308
2
0
18 Feb 2025
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Yacouba Kaloga
Shashi Kumar
P. Motlícek
Ina Kodrasi
OT
420
0
0
03 Feb 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
HadamRNN: Binary and Sparse Ternary Orthogonal RNNsInternational Conference on Learning Representations (ICLR), 2025
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
914
1
0
28 Jan 2025
Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge TransferIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Hu Hu
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Chin-Hui Lee
270
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
411
40
0
24 Jan 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
243
4
0
08 Jan 2025
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Wei Zhang
Tian-Hao Zhang
Chao Luo
Hui Zhou
Chao Yang
Xinyuan Qian
Xu-cheng Yin
124
0
0
08 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical ComparisonNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
433
3
0
04 Jan 2025
Automatic Text Pronunciation Correlation Generation and Application for Contextual BiasingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Gaofeng Cheng
Haitian Lu
Chengxu Yang
Xuyang Wang
Ta Li
Yonghong Yan
74
0
0
03 Jan 2025
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer
  Learning and a Large Corpus
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large CorpusIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Amir Harati
Elizabeth Shriberg
Tomasz Rutowski
Piotr Chlebek
Yang Lu
Ricardo Oliveira
342
23
0
22 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific TransliterationAAAI Conference on Artificial Intelligence (AAAI), 2024
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
471
1
0
19 Dec 2024
Complexity boosted adaptive training for better low resource ASR
  performance
Complexity boosted adaptive training for better low resource ASR performance
Hongxuan Lu
Shenjian Wang
Biao Li
273
0
0
01 Dec 2024
1234...202122
Next