ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell
v1v2 (latest)

Listen, Attend and Spell

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXiv (abs)PDFHTML

Papers citing "Listen, Attend and Spell"

50 / 1,064 papers shown
CTC Alignments Improve Autoregressive Translation
CTC Alignments Improve Autoregressive TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
182
36
0
11 Oct 2022
DeepPerform: An Efficient Approach for Performance Testing of
  Resource-Constrained Neural Networks
DeepPerform: An Efficient Approach for Performance Testing of Resource-Constrained Neural NetworksInternational Conference on Automated Software Engineering (ASE), 2022
Simin Chen
Mirazul Haque
Cong Liu
Wei Yang
209
24
0
10 Oct 2022
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMTConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mayumi Ohta
Julia Kreutzer
Stefan Riezler
166
0
0
05 Oct 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer ModelsIEEE International Joint Conference on Neural Network (IJCNN), 2022
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
173
13
0
20 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
267
1
0
17 Sep 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech RecognitionInterspeech (Interspeech), 2022
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
140
3
0
17 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic
  Speech Recognition
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech RecognitionInterspeech (Interspeech), 2022
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
128
5
0
13 Sep 2022
Non-autoregressive Error Correction for CTC-based ASR with
  Phone-conditioned Masked LM
Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LMInterspeech (Interspeech), 2022
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
248
13
0
08 Sep 2022
Distilling the Knowledge of BERT for CTC-based ASR
Distilling the Knowledge of BERT for CTC-based ASR
Hayato Futami
Hirofumi Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
189
11
0
05 Sep 2022
Vision-Language Adaptive Mutual Decoder for OOV-STR
Vision-Language Adaptive Mutual Decoder for OOV-STR
Jinshui Hu
Chenyu Liu
Qiandong Yan
Xuyang Zhu
Jiajia Wu
Feng Yu
Bing Yin
VLM
273
1
0
02 Sep 2022
Bayesian Neural Network Language Modeling for Speech Recognition
Bayesian Neural Network Language Modeling for Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Boyang Xue
Shoukang Hu
Junhao Xu
Mengzhe Geng
Xunying Liu
Helen M. Meng
UQCVBDL
264
23
0
28 Aug 2022
Interpretable Multimodal Emotion Recognition using Hybrid Fusion of
  Speech and Image Data
Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data
Puneet Kumar
Sarthak Malik
Balasubramanian Raman
CVBM
200
35
0
25 Aug 2022
Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
Comparison and Analysis of New Curriculum Criteria for End-to-End ASRInterspeech (Interspeech), 2022
Georgios Karakasidis
Tamás Grósz
M. Kurimo
133
3
0
10 Aug 2022
ASR Error Correction with Constrained Decoding on Operation Prediction
ASR Error Correction with Constrained Decoding on Operation PredictionInterspeech (Interspeech), 2022
J. Yang
Rong-Zhi Li
Wei Peng
192
12
0
09 Aug 2022
Adversarial Attacks on ASR Systems: An Overview
Adversarial Attacks on ASR Systems: An OverviewInternational Conference on Data Science in Cyberspace (ICDSC), 2022
Xiao Zhang
Hao Tan
Xuan Huang
Denghui Zhang
Keke Tang
Zhaoquan Gu
AAML
132
3
0
03 Aug 2022
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States
VQ-T: RNN Transducers using Vector-Quantized Prediction Network StatesInterspeech (Interspeech), 2022
Jiatong Shi
G. Saon
David Haws
Shinji Watanabe
Brian Kingsbury
153
3
0
03 Aug 2022
Pronunciation-aware unique character encoding for RNN Transducer-based
  Mandarin speech recognition
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognitionSpoken Language Technology Workshop (SLT), 2022
Peng Shen
Xugang Lu
Hisashi Kawai
106
2
0
29 Jul 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
230
12
0
24 Jul 2022
Reducing Geographic Disparities in Automatic Speech Recognition via
  Elastic Weight Consolidation
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight ConsolidationInterspeech (Interspeech), 2022
V. Trinh
Pegah Ghahremani
Brian King
J. Droppo
A. Stolcke
Roland Maas
MoMe
107
7
0
16 Jul 2022
PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber
  for Polyphonic Music
PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic MusicIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xiaoxue Gao
Chitralekha Gupta
Haizhou Li
273
9
0
15 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and UnderstandingInternational Conference on Machine Learning (ICML), 2022
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
271
193
0
06 Jul 2022
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech RecognitionInterspeech (Interspeech), 2022
Jiamin Xie
John H. L. Hansen
172
1
0
04 Jul 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings
  for Contextual Speech Recognition
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech RecognitionInterspeech (Interspeech), 2022
Guangzhi Sun
Chuxu Zhang
P. Woodland
154
18
0
02 Jul 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech
  Self-Supervised Learning
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Yeonghyeon Lee
Kangwook Jang
Jahyun Goo
Youngmoon Jung
Hoi-Rim Kim
248
39
0
01 Jul 2022
Language-specific Characteristic Assistance for Code-switching Speech
  Recognition
Language-specific Characteristic Assistance for Code-switching Speech RecognitionInterspeech (Interspeech), 2022
Tongtong Song
Qiang Xu
Meng Ge
Longbiao Wang
Hao Shi
Yongjie Lv
Yuqin Lin
Jianwu Dang
195
35
0
29 Jun 2022
Contextual Density Ratio for Language Model Biasing of Sequence to
  Sequence ASR Systems
Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR SystemsInterspeech (Interspeech), 2021
Jesús Andrés-Ferrer
Dario Albesano
P. Zhan
Paul Vozila
106
6
0
29 Jun 2022
On Comparison of Encoders for Attention based End to End Speech
  Recognition in Standalone and Rescoring Mode
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring ModeInternational Conference on Signal Processing and Communications (ICSPC), 2022
Raviraj Joshi
Subodh Kumar
111
2
0
26 Jun 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System
  on the 300-hr Switchboard Corpus
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard CorpusInterspeech (Interspeech), 2022
Junhao Xu
Shoukang Hu
Xunying Liu
Helen M. Meng
MQ
214
5
0
23 Jun 2022
Two-pass Decoding and Cross-adaptation Based System Combination of
  End-to-end Conformer and Hybrid TDNN ASR Systems
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR SystemsInterspeech (Interspeech), 2022
Mingyu Cui
Jiajun Deng
Shoukang Hu
Xurong Xie
Tianzi Wang
Shujie Hu
Mengzhe Geng
Boyang Xue
Xunying Liu
Helen M. Meng
145
10
0
23 Jun 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision
Boosting Cross-Domain Speech Recognition with Self-SupervisionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Hanjing Zhu
Gaofeng Cheng
Yongfeng Zhang
Wenxin Hou
Pengyuan Zhang
Yonghong Yan
344
22
0
20 Jun 2022
Avoid Overfitting User Specific Information in Federated Keyword
  Spotting
Avoid Overfitting User Specific Information in Federated Keyword SpottingInterspeech (Interspeech), 2022
Xin-Chun Li
Jin-Lin Tang
Shaoming Song
Bingshuai Li
Yinchuan Li
Yunfeng Shao
Le Gan
De-Chuan Zhan
FedMLAAML
143
9
0
17 Jun 2022
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech RecognitionInterspeech (Interspeech), 2022
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
237
181
0
16 Jun 2022
Residual Language Model for End-to-end Speech Recognition
Residual Language Model for End-to-end Speech RecognitionInterspeech (Interspeech), 2022
E. Tsunoo
Yosuke Kashiwagi
Chaitanya Narisetty
Shinji Watanabe
148
11
0
15 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
LegoNN: Building Modular Encoder-Decoder ModelsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLMMoE
176
16
0
07 Jun 2022
Contextual Adapters for Personalized Speech Recognition in Neural
  Transducers
Contextual Adapters for Personalized Speech Recognition in Neural TransducersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kanthashree Mysore Sathyendra
Thejaswi Muniyappa
Feng-Ju Chang
Jing Liu
Jinru Su
Grant P. Strimel
Athanasios Mouchtaris
Siegfried Kunzmann
188
87
0
26 May 2022
Transcormer: Transformer for Sentence Scoring with Sliding Language
  Modeling
Transcormer: Transformer for Sentence Scoring with Sliding Language ModelingNeural Information Processing Systems (NeurIPS), 2022
Kaitao Song
Yichong Leng
Xu Tan
Yicheng Zou
Tao Qin
Dongsheng Li
235
11
0
25 May 2022
Adaptive multilingual speech recognition with pretrained models
Adaptive multilingual speech recognition with pretrained modelsInterspeech (Interspeech), 2022
Ngoc-Quan Pham
A. Waibel
Jan Niehues
VLM
210
25
0
24 May 2022
Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition
Multi-Level Modeling Units for End-to-End Mandarin Speech RecognitionInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Yuting Yang
Binbin Du
Yuke Li
329
2
0
24 May 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Yike Guo
Xin Xu
M. Pietikäinen
Tianpeng Liu
VLM
314
53
0
22 May 2022
Minimising Biasing Word Errors for Contextual ASR with the
  Tree-Constrained Pointer Generator
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer GeneratorIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Guangzhi Sun
Chuxu Zhang
P. Woodland
202
16
0
18 May 2022
Evaluating Membership Inference Through Adversarial Robustness
Evaluating Membership Inference Through Adversarial RobustnessComputer/law journal (JITPL), 2022
Zhaoxi Zhang
L. Zhang
Xufei Zheng
Bilal Hussain Abbasi
Shengshan Hu
AAML
202
19
0
14 May 2022
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence
  ASR via Speech Chain Reconstruction and Self-Transcribing
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-TranscribingInterspeech (Interspeech), 2022
Heli Qi
Sashi Novitasari
S. Sakti
Satoshi Nakamura
AI4TS
252
2
0
14 May 2022
Personalized Adversarial Data Augmentation for Dysarthric and Elderly
  Speech Recognition
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zengrui Jin
Mengzhe Geng
Jiajun Deng
Tianzi Wang
Shujie Hu
Guinan Li
Xunying Liu
226
38
0
13 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo LanguagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
216
46
0
02 May 2022
How does a spontaneously speaking conversational agent affect user
  behavior?
How does a spontaneously speaking conversational agent affect user behavior?IEEE Access (IEEE Access), 2022
Takahisa Iizuka
H. Mori
43
4
0
02 May 2022
Bilingual End-to-End ASR with Byte-Level Subwords
Bilingual End-to-End ASR with Byte-Level SubwordsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Liuhui Deng
Roger Hsiao
Arnab Ghoshal
149
6
0
01 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
279
290
0
27 Apr 2022
Supervised Attention in Sequence-to-Sequence Models for Speech
  Recognition
Supervised Attention in Sequence-to-Sequence Models for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Gene-Ping Yang
Hao Tang
121
6
0
25 Apr 2022
Efficient Training of Neural Transducer for Speech Recognition
Efficient Training of Neural Transducer for Speech RecognitionInterspeech (Interspeech), 2022
Wei Zhou
Wilfried Michel
Ralf Schluter
Hermann Ney
AI4TS
188
28
0
22 Apr 2022
Cross-stitched Multi-modal Encoders
Cross-stitched Multi-modal Encoders
Karan Singla
Daniel Pressel
Ryan Price
Bhargav Srinivas Chinnari
Yeon-Jun Kim
S. Bangalore
161
0
0
20 Apr 2022
Previous
123...678...202122
Next