ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,064 papers shown
Advanced Long-Content Speech Recognition With Factorized Neural
  Transducer
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Xun Gong
Yu Wu
Jinyu Li
Shujie Liu
Rui Zhao
Xie Chen
Yanmin Qian
228
14
0
20 Mar 2024
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Skipformer: A Skip-and-Recover Strategy for Efficient Speech RecognitionIEEE International Conference on Multimedia and Expo (ICME), 2024
Wenjing Zhu
Sining Sun
Changhao Shan
Peng Fan
Qing Yang
246
3
0
13 Mar 2024
The evaluation of a code-switched Sepedi-English automatic speech
  recognition system
The evaluation of a code-switched Sepedi-English automatic speech recognition system
Amanda Phaladi
T. Modipa
163
0
0
11 Mar 2024
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma
G. Qiao
Yián Liu
L. Meng
N. Ning
Yang Liu
Shaogang Hu
AAMLMQ
251
5
0
06 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
192
12
0
02 Mar 2024
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn
  Medical Interview
Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Heyang Liu
Yu Wang
Yanfeng Wang
278
0
0
01 Mar 2024
Extreme Encoder Output Frame Rate Reduction: Improving Computational
  Latencies of Large End-to-End Models
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
Rohit Prabhavalkar
Zhong Meng
Weiran Wang
Adam Stooke
Xingyu Cai
Yanzhang He
Arun Narayanan
Dongseong Hwang
Tara N. Sainath
Pedro J. Moreno
201
11
0
27 Feb 2024
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model
  Improves End-to-End ASR
Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR
Jintao Jiang
Yingbo Gao
Mohammad Zeineldeen
Zoltán Tüske
287
0
0
23 Feb 2024
How do Hyenas deal with Human Speech? Speech Recognition and Translation
  with ConfHyena
How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
231
1
0
20 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
201
2
0
20 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
171
102
0
13 Feb 2024
Self-consistent context aware conformer transducer for speech
  recognition
Self-consistent context aware conformer transducer for speech recognition
Konstantin Kolokolov
Pavel Pekichev
Karthik Raghunathan
171
0
0
09 Feb 2024
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor AttacksIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024
Yige Li
Jiabo He
Jiabo He
Hanxun Huang
Xingjun Ma
Yu-Gang Jiang
AAML
308
3
0
27 Jan 2024
Contextualized Automatic Speech Recognition with Attention-Based Bias
  Phrase Boosted Beam Search
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
Yui Sudo
Muhammad Shakeel
Yosuke Fukumoto
Yifan Peng
Shinji Watanabe
221
17
0
19 Jan 2024
Improving ASR Contextual Biasing with Guided Attention
Improving ASR Contextual Biasing with Guided AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiyang Tang
Kwangyoun Kim
Suwon Shon
Felix Wu
Prashant Sridhar
Shinji Watanabe
186
22
0
16 Jan 2024
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
LCB-net: Long-Context Biasing for Audio-Visual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Fan Yu
Haoxu Wang
Xian Shi
Shiliang Zhang
213
5
0
12 Jan 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Cross-Speaker Encoding Network for Multi-Talker Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
163
9
0
08 Jan 2024
A unified multichannel far-field speech recognition system: combining
  neural beamforming with attention based end-to-end model
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Dongdi Zhao
Jianbo Ma
Lu Lu
Jinke Li
Xuan Ji
Lei Zhu
Fuming Fang
Ming-Yuan Liu
Feijun Jiang
104
1
0
05 Jan 2024
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based
  Speech Recognition
CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Junfeng Hou
Peiyao Wang
Jincheng Zhang
Meng Yang
Minwei Feng
Jingcheng Yin
160
2
0
04 Jan 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
330
6
0
22 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
181
2
0
18 Dec 2023
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Conformer-Based Speech Recognition On Extreme Edge-Computing DevicesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Mingbin Xu
Alex Jin
Sicheng Wang
Mu Su
Tim Ng
...
Shiyi Han
Zhihong Lei
Yaqiao Deng
Zhen Huang
Mahesh Krishnamoorthy
238
9
0
16 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
472
13
0
13 Dec 2023
D4AM: A General Denoising Framework for Downstream Acoustic Models
D4AM: A General Denoising Framework for Downstream Acoustic ModelsInternational Conference on Learning Representations (ICLR), 2023
H. Wang
Yu Tsao
Hsin-Min Wang
Chu-Song Chen
175
5
0
28 Nov 2023
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR
Jintao Jiang
Yingbo Gao
Zoltán Tüske
335
1
0
24 Nov 2023
Analysis of Visual Features for Continuous Lipreading in Spanish
Analysis of Visual Features for Continuous Lipreading in SpanishIberSPEECH Conference (IberSPEECH), 2021
David Gimeno-Gómez
Carlos David Martínez Hinarejos
218
3
0
21 Nov 2023
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildInternational Conference on Language Resources and Evaluation (LREC), 2023
David Gimeno-Gómez
Carlos David Martínez Hinarejos
232
8
0
21 Nov 2023
Phonological Level wav2vec2-based Mispronunciation Detection and
  Diagnosis Method
Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method
M. Shahin
Julien Epps
Beena Ahmed
122
3
0
13 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
235
3
0
01 Nov 2023
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
  Labelling
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
VLM
340
104
0
01 Nov 2023
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
MixRep: Hidden Representation Mixup for Low-Resource Speech RecognitionInterspeech (Interspeech), 2023
Jiamin Xie
John H. L. Hansen
144
5
0
27 Oct 2023
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech
  Recognition
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2023
Peng Fan
Changhao Shan
Sining Sun
Qing Yang
Jianwei Zhang
239
4
0
23 Oct 2023
Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm
Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool AlgorithmScientific Reports (Sci Rep), 2023
S. M. Fazle
J. Mondal
Meem Arafat Manab
Xi Xiao
Sarfaraz Newaz
AAML
455
2
0
18 Oct 2023
End-to-End real time tracking of children's reading with pointer network
End-to-End real time tracking of children's reading with pointer networkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Vishal Sunder
Beulah Karrolla
Eric Fosler-Lussier
77
0
0
17 Oct 2023
Correction Focused Language Model Training for Speech Recognition
Correction Focused Language Model Training for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yingyi Ma
Zhe Liu
Ozlem Kalinli
KELM
283
6
0
17 Oct 2023
Personalization of CTC-based End-to-End Speech Recognition Using
  Pronunciation-Driven Subword Tokenization
Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization
Zhihong Lei
Ernest Pusateri
Shiyi Han
Leo Liu
Mingbin Xu
...
R. Travadi
Youyuan Zhang
Mirko Hannemann
Man-Hung Siu
Zhen Huang
211
10
0
16 Oct 2023
Improved Contextual Recognition In Automatic Speech Recognition Systems
  By Semantic Lattice Rescoring
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Ankitha Sudarshan
Vinay Samuel
Parth Patwa
Ibtihel Amara
Vasu Sharma
359
3
0
14 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training
  Data for Automatic Speech Recognition
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
148
5
0
12 Oct 2023
Investigating the Effect of Language Models in Sequence Discriminative
  Training for Neural Transducers
Investigating the Effect of Language Models in Sequence Discriminative Training for Neural TransducersAutomatic Speech Recognition & Understanding (ASRU), 2023
Zijian Yang
Wei Zhou
Ralf Schluter
Hermann Ney
158
0
0
11 Oct 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
  for Speech Recognition
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech RecognitionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
S. Radhakrishnan
Chao-Han Huck Yang
S. Khan
Rohit Kumar
N. Kiani
D. Gómez-Cabrero
Jesper N. Tegnér
357
77
0
10 Oct 2023
ed-cec: improving rare word recognition using asr postprocessing based
  on error detection and context-aware error correction
ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correctionAutomatic Speech Recognition & Understanding (ASRU), 2023
Jiajun He
Zekun Yang
Tomoki Toda
190
10
0
08 Oct 2023
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech
  Recognition
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Kaixun Huang
Aoting Zhang
Binbin Zhang
Tianyi Xu
Xingchen Song
Lei Xie
167
5
0
07 Oct 2023
Dementia Assessment Using Mandarin Speech with an Attention-based Speech
  Recognition Encoder
Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition EncoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zih-Jyun Lin
Yi-Ju Chen
P. Kuo
Likai Huang
Chaur-Jong Hu
Cheng-Yu Chen
116
2
0
06 Oct 2023
Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm
Contextual Biasing with the Knuth-Morris-Pratt Matching AlgorithmInterspeech (Interspeech), 2023
Weiran Wang
Zelin Wu
D. Caseiro
Tsendsuren Munkhdalai
K. Sim
...
Rohit Prabhavalkar
Zhong Meng
Ding Zhao
Tara N. Sainath
P. M. Mengibar
242
10
0
29 Sep 2023
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation
  Auxiliary Task for E2E Code-switching ASR
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASRAutomatic Speech Recognition & Understanding (ASRU), 2023
Guodong Ma
Wenxuan Wang
Yuke Li
Yuting Yang
Binbin Du
Haoran Fu
222
10
0
28 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter SharingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
288
5
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
219
61
0
27 Sep 2023
Developing automatic verbatim transcripts for international multilingual
  meetings: an end-to-end solution
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solutionMachine Translation Summit (MT Summit), 2023
Akshat Dewan
Michal Ziemski
Henri Meylan
Lorenzo Concina
Bruno Pouliquen
133
1
0
27 Sep 2023
Segment-Level Vectorized Beam Search Based on Partially Autoregressive
  Inference
Segment-Level Vectorized Beam Search Based on Partially Autoregressive InferenceAutomatic Speech Recognition & Understanding (ASRU), 2023
Masao Someki
N. Eng
Yosuke Higuchi
Shinji Watanabe
286
1
0
26 Sep 2023
On the Relation between Internal Language Model and Sequence
  Discriminative Training for Neural Transducers
On the Relation between Internal Language Model and Sequence Discriminative Training for Neural TransducersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zijian Yang
Wei Zhou
Ralf Schluter
Hermann Ney
260
1
0
25 Sep 2023
Previous
123456...202122
Next