ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1508.01211
  4. Cited By
Listen, Attend and Spell

Listen, Attend and Spell

5 August 2015
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
    RALM
ArXivPDFHTML

Papers citing "Listen, Attend and Spell"

50 / 492 papers shown
Title
A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network
A 71.2-μμμW Speech Recognition Accelerator with Recurrent Spiking Neural Network
Chih-Chyau Yang
Tian-Sheuan Chang
60
1
0
27 Mar 2025
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
Aniket Abhishek Soni
49
0
0
26 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
93
0
0
26 Mar 2025
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
55
0
0
19 Mar 2025
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Michael McGuire
47
0
0
10 Mar 2025
Training and Inference Efficiency of Encoder-Decoder Speech Models
Training and Inference Efficiency of Encoder-Decoder Speech Models
Piotr .Zelasko
Kunal Dhawan
Daniel Galvez
Krishna C. Puvvada
Ankita Pasad
Nithin Rao Koluguri
Ke Hu
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
41
0
0
07 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
61
1
0
06 Mar 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Khanh Le
Duc Thanh Chau
AI4TS
66
0
0
24 Feb 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMe
VLM
143
0
0
24 Feb 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
60
0
0
24 Feb 2025
Note-Level Singing Melody Transcription for Time-Aligned Musical Score Generation
Note-Level Singing Melody Transcription for Time-Aligned Musical Score Generation
Leekyung Kim
Sungwook Jeon
Wan Heo
Jonghun Park
85
0
0
18 Feb 2025
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Yacouba Kaloga
Shashi Kumar
P. Motlícek
Ina Kodrasi
OT
74
0
0
03 Feb 2025
Variational Bayesian Adaptive Learning of Deep Latent Variables for Acoustic Knowledge Transfer
Hu Hu
Sabato Marco Siniscalchi
Chao-Han Huck Yang
Chin-Hui Lee
65
0
0
28 Jan 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
74
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
69
4
0
24 Jan 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
40
0
0
08 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
31
0
0
04 Jan 2025
LAMA-UT: Language Agnostic Multilingual ASR through Orthography
  Unification and Language-Specific Transliteration
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
70
0
0
19 Dec 2024
Complexity boosted adaptive training for better low resource ASR
  performance
Complexity boosted adaptive training for better low resource ASR performance
Hongxuan Lu
Shenjian Wang
Biao Li
62
0
0
01 Dec 2024
Towards Maximum Likelihood Training for Transducer-based Streaming
  Speech Recognition
Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition
Hyeonseung Lee
J. Yoon
Sungsoo Kim
N. Kim
61
0
0
26 Nov 2024
On the Cost of Model-Serving Frameworks: An Experimental Evaluation
On the Cost of Model-Serving Frameworks: An Experimental Evaluation
Pasquale De Rosa
Yérom-David Bromberg
Pascal Felber
Djob Mvondo
V. Schiavoni
25
0
0
15 Nov 2024
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography
Viswanath Sivakumar
Jeffrey Seely
Alan Du
Sean R Bittner
Adam Berenzweig
Anuoluwapo Bolarinwa
Alexandre Gramfort
Michael I Mandel
13
3
0
26 Oct 2024
A two-stage transliteration approach to improve performance of a
  multilingual ASR
A two-stage transliteration approach to improve performance of a multilingual ASR
Rohit Kumar
13
0
0
09 Oct 2024
The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge
The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge
Ya Jiang
Hongbo Lan
Jun Du
Qing Wang
Shutong Niu
40
1
0
08 Oct 2024
Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges
Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges
Nguyen Van Dinh
Thanh Chi Dang
Luan Thanh Nguyen
Kiet Van Nguyen
21
2
0
04 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
31
0
0
01 Oct 2024
Predictive Speech Recognition and End-of-Utterance Detection Towards
  Spoken Dialog Systems
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Oswald Zink
Yosuke Higuchi
Carlos Mullov
Alexander Waibel
Tetsunori Kobayashi
29
0
0
30 Sep 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State
  Spaces Models
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Xiaoxue Gao
Nancy F. Chen
Mamba
35
1
0
27 Sep 2024
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
Brian Yan
Vineel Pratap
Shinji Watanabe
Michael Auli
28
0
0
27 Sep 2024
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
Kun Song
Zhiquan Tan
Bochao Zou
Jiansheng Chen
Huimin Ma
Weiran Huang
37
0
0
25 Sep 2024
Target word activity detector: An approach to obtain ASR word boundaries
  without lexicon
Target word activity detector: An approach to obtain ASR word boundaries without lexicon
S. Sivasankaran
Eric Sun
Jinyu Li
Yan-ping Huang
Jing Pan
30
0
0
20 Sep 2024
EMMeTT: Efficient Multimodal Machine Translation Training
EMMeTT: Efficient Multimodal Machine Translation Training
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
28
1
0
20 Sep 2024
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet Gündüz
Yunsu Kim
Kamer Ali Yuksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
33
0
0
19 Sep 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
41
1
0
19 Sep 2024
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling
  Framework
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
Zheng Nan
T. Dang
V. Sethu
Beena Ahmed
16
0
0
17 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic
  Speech Recognition Challenge
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Hongfei Xue
Rong Gong
Mingchen Shao
Xin Xu
L. xilinx Wang
...
Yong Qin
Jun Du
Ming Li
Binbin Zhang
Bin Jia
23
1
0
09 Sep 2024
Lightweight Transducer Based on Frame-Level Criterion
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
36
1
0
05 Sep 2024
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative
  Mixture of Experts Model
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
Hukai Huang
Jiayan Lin
K. Wang
Yishuang Li
Wenhao Guan
Lin Li
Q. Hong
MoE
29
0
0
03 Sep 2024
Reassessing Noise Augmentation Methods in the Context of Adversarial
  Speech
Reassessing Noise Augmentation Methods in the Context of Adversarial Speech
Karla Pizzi
Matías Pizarro
Asja Fischer
28
0
0
03 Sep 2024
What does it take to get state of the art in simultaneous
  speech-to-speech translation?
What does it take to get state of the art in simultaneous speech-to-speech translation?
Vincent Wilmet
Johnson Du
20
0
0
02 Sep 2024
Serialized Speech Information Guidance with Overlapped Encoding
  Separation for Multi-Speaker Automatic Speech Recognition
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
30
1
0
01 Sep 2024
The State of Commercial Automatic French Legal Speech Recognition
  Systems and their Impact on Court Reporters et al
The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
Nicolad Garneau
Olivier Bolduc
ELM
AILaw
45
0
0
21 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
On the Problem of Text-To-Speech Model Selection for Synthetic Data
  Generation in Automatic Speech Recognition
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
Nick Rossenbach
Ralf Schluter
S. Sakti
24
2
0
31 Jul 2024
On the Effect of Purely Synthetic Training Data for Different Automatic
  Speech Recognition Architectures
On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
25
1
0
25 Jul 2024
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based
  Streaming ASR
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Wenbo Zhao
Ziwei Li
Chuan Yu
Zhijian Ou
AI4TS
21
0
0
14 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based
  Speech Recognition
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
44
19
0
05 Jul 2024
Serialized Output Training by Learned Dominance
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
19
3
0
04 Jul 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of
  Two Worlds in GPT and T5
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
34
10
0
28 Jun 2024
1234...8910
Next