v1v2 (latest)

Listen, Attend and Spell

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015

5 August 2015

Papers citing "Listen, Attend and Spell"

50 / 1,064 papers shown

Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models

Nadeen Fathallah

Monika Bhole

Steffen Staab

367

30 Nov 2024

Towards Maximum Likelihood Training for Transducer-based Streaming Speech RecognitionIEEE Signal Processing Letters (SPL), 2024

306

26 Nov 2024

On the Cost of Model-Serving Frameworks: An Experimental Evaluation

208

15 Nov 2024

emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface ElectromyographyNeural Information Processing Systems (NeurIPS), 2024

374

26 Oct 2024

A two-stage transliteration approach to improve performance of a multilingual ASR

Rohit Kumar

165

09 Oct 2024

The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge

309

08 Oct 2024

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and ChallengesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

222

04 Oct 2024

The Conformer Encoder May Reverse the Time DimensionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

293

01 Oct 2024

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems

Alexander Waibel

133

30 Sep 2024

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces ModelsSpoken Language Technology Workshop (SLT), 2024

Xiaoxue Gao

Nancy F. Chen

Mamba

212

27 Sep 2024

Improving Multilingual ASR in the Wild Using Simple N-best Re-rankingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Brian Yan

Vineel Pratap

Shinji Watanabe

Michael Auli

255

27 Sep 2024

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

Bochao Zou

Huimin Ma

388

25 Sep 2024

Target word activity detector: An approach to obtain ASR word boundaries without lexiconIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Jinyu Li

152

20 Sep 2024

EMMeTT: Efficient Multimodal Machine Translation TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Piotr Żelasko

Zhehuai Chen

Jagadeesh Balam

Boris Ginsburg

179

20 Sep 2024

AutoMode-ASR: Learning to Select ASR Systems for Better Quality and CostInternational Conference on Speech and Computer (SPECOM), 2024

Ahmet Gündüz

Yunsu Kim

Kamer Ali Yuksel

Mohamed Al-Badrashiny

Thiago Castro Ferreira

Hassan Sawaf

194

19 Sep 2024

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

Jiawen Kang

Lingwei Meng

Mingyu Cui

Yuejiao Wang

Xixin Wu

Xunying Liu

Helen Meng

275

19 Sep 2024

A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework

150

17 Sep 2024

ASR Error Correction using Large Language ModelsIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

313

14 Sep 2024

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition ChallengeSpoken Language Technology Workshop (SLT), 2024

Hongfei Xue

Rong Gong

Mingchen Shao

Xin Xu

L. xilinx Wang

...

Yong Qin

Jun Du

Ming Li

Binbin Zhang

Bin Jia

186

09 Sep 2024

Lightweight Transducer Based on Frame-Level CriterionInterspeech (Interspeech), 2024

241

05 Sep 2024

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts ModelSpoken Language Technology Workshop (SLT), 2024

Lin Li

232

03 Sep 2024

Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR SystemsComputer Speech and Language (CSL), 2024

Karla Pizzi

Matías P. Pizarro

Asja Fischer

332

03 Sep 2024

What does it take to get state of the art in simultaneous speech-to-speech translation?

Vincent Wilmet

Johnson Du

171

02 Sep 2024

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech RecognitionSpoken Language Technology Workshop (SLT), 2024

Hao Shi

Yuan Gao

Zhaoheng Ni

Tatsuya Kawahara

439

01 Sep 2024

The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al

Nicolad Garneau

Olivier Bolduc

ELM AILaw

165

21 Aug 2024

Survey: Transformer-based Models in Data Modality Conversion

225

08 Aug 2024

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

Nick Rossenbach

Ralf Schluter

S. Sakti

182

31 Jul 2024

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures

Nick Rossenbach

Benedikt Hilmes

Ralf Schluter

223

25 Jul 2024

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR

257

14 Jul 2024

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Ye Bai

Jingping Chen

Jitong Chen

Wei Chen

Zhuo Chen

...

Yang Zhang

Yijie Zheng

364

05 Jul 2024

Serialized Output Training by Learned Dominance

Ying Shi

145

04 Jul 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

Zhehuai Chen

Piotr Żelasko

Jagadeesh Balam

Boris Ginsburg

AuLLM RALM

249

28 Jun 2024

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

Ke Ding

Guanglu Wan

203

26 Jun 2024

Token-Weighted RNN-T for Learning from Flawed Data

Gil Keren

Wei Zhou

Ozlem Kalinli

264

26 Jun 2024

Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet

190

25 Jun 2024

InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions

Yu Nakagome

Michael Hentschel

209

21 Jun 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models

Vahid Noroozi

Zhehuai Chen

Somshubra Majumdar

Steve Huang

Jagadeesh Balam

Boris Ginsburg

SyDa

341

18 Jun 2024

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim

Hantae Kim

Kyogu Lee

186

12 Jun 2024

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Yerbolat Khassanov

Zhipeng Chen

Tianfeng Chen

Tze Yuang Chong

Wei Li

Jun Zhang

Lu Lu

Yuxuan Wang

AI4CE

177

12 Jun 2024

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

386

10 Jun 2024

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASRInterspeech (Interspeech), 2024

Zheshu Song

Jianheng Zhuo

Yifan Yang

Ziyang Ma

Shixiong Zhang

Xie Chen

211

07 Jun 2024

Unveiling the Dynamics of Information Interplay in Supervised Learning

Huimin Ma

227

06 Jun 2024

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders

Yui Sudo

Muhammad Shakeel

Yosuke Fukumoto

Brian Yan

Jiatong Shi

Yifan Peng

Shinji Watanabe

243

05 Jun 2024

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Muhammad Shakeel

Yui Sudo

Yifan Peng

Shinji Watanabe

233

22 May 2024

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Shinji Watanabe

290

22 May 2024

Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

186

24 Apr 2024

Transducers with Pronunciation-aware Embeddings for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Hainan Xu

Zhehuai Chen

Fei Jia

Boris Ginsburg

167

04 Apr 2024

Effective internal language model training and fusion for factorized transducer modelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ozlem Kalinli

195

02 Apr 2024

Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights

...

Ehsan Khodapanah Aghdam

Amirhossein Kazerouni

Ilker Hacihaliloglu

Dorit Merhof

304

28 Mar 2024

^3

AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

Zhe Chen

Heyang Liu

Wenyi Yu

Guangzhi Sun

Chao Zhang

175

21 Mar 2024