v1v2v3v4v5 (latest)

Streaming automatic speech recognition with the transformer model

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

8 January 2020

Papers citing "Streaming automatic speech recognition with the transformer model"

50 / 115 papers shown

SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets

Manolis Mylonas

Charalampia Zerva

Evlampios Apostolidis

Vasileios Mezaris

131

07 Oct 2025

Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting

108

01 Oct 2025

CarelessWhisper: Turning Whisper into a Causal Streaming Model

Tomer Krichli

Bhiksha Raj

Joseph Keshet

17 Aug 2025

Conformer-based Ultrasound-to-Speech Conversion

177

04 Jun 2025

A 71.2-

μ

W Speech Recognition Accelerator with Recurrent Spiking Neural NetworkIEEE Transactions on Circuits and Systems Part 1: Regular Papers (TCAS-I), 2024

Chih-Chyau Yang

Tian-Sheuan Chang

376

27 Mar 2025

ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Haoxu Wang

Biao Tian

134

10 Jan 2025

Large Language Models Are Read/Write Policy-Makers for Simultaneous GenerationAAAI Conference on Artificial Intelligence (AAAI), 2025

223

03 Jan 2025

The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge

311

08 Oct 2024

Mamba for Streaming ASR Combined with Unimodal AggregationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ying Fang

Xiaofei Li

Mamba

228

30 Sep 2024

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems

Alexander Waibel

145

30 Sep 2024

Learning from Demonstration with Implicit Nonlinear Dynamics Models

Peter David Fagan

Subramanian Ramamoorthy

934

27 Sep 2024

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition

Ming-Hao Hsu

Kuan Po Huang

371

16 Sep 2024

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Muhammad Shakeel

Yui Sudo

Yifan Peng

Shinji Watanabe

250

22 May 2024

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving SpeakersIEEE Signal Processing Letters (SPL), 2024

Changsheng Quan

Xiaofei Li

347

12 Mar 2024

Streaming Sequence Transduction through Dynamic Compression

523

02 Feb 2024

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Vahid Noroozi

Somshubra Majumdar

Ankur Kumar

Jagadeesh Balam

Boris Ginsburg

454

27 Dec 2023

Revisiting the Entropy Semiring for Neural Speech RecognitionInternational Conference on Learning Representations (ICLR), 2023

Oscar Chang

DongSeon Hwang

Olivier Siohan

372

13 Dec 2023

Unified Segment-to-Segment Framework for Simultaneous Sequence GenerationNeural Information Processing Systems (NeurIPS), 2023

Shaolei Zhang

Yang Feng

278

27 Oct 2023

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency TradeoffInterspeech (Interspeech), 2023

Shinji Watanabe

159

20 Sep 2023

Semi-Autoregressive Streaming ASR With Label ContextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shinji Watanabe

225

19 Sep 2023

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

181

14 Sep 2023

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual CorpusIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

265

11 Sep 2023

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech RecognitionEuropean Signal Processing Conference (EUSIPCO), 2023

205

09 Sep 2023

Radio2Text: Streaming Speech Recognition Using mmWave Radio SignalsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023

248

16 Aug 2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization AbilityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

277

07 Aug 2023

ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging

161

05 Aug 2023

TST: Time-Sparse Transducer for Automatic Speech RecognitionCAAI International Conference on Artificial Intelligence (ICCAI), 2023

Jiangyan Yi

122

17 Jul 2023

BASS: Block-wise Adaptation for Speech SummarizationInterspeech (Interspeech), 2023

Bhiksha Raj

174

17 Jul 2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual AlignmentsAutomatic Speech Recognition & Understanding (ASRU), 2023

340

07 Jul 2023

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice ConversionInterspeech (Interspeech), 2023

153

28 Jun 2023

Advancing Adversarial Training by Injecting Booster SignalIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

289

27 Jun 2023

Streaming Speech-to-Confusion Network Speech RecognitionInterspeech (Interspeech), 2023

206

02 Jun 2023

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with PunctuationInterspeech (Interspeech), 2023

119

02 Jun 2023

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language UnderstandingInterspeech (Interspeech), 2023

282

01 Jun 2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive LearningInterspeech (Interspeech), 2023

168

01 Jun 2023

Streaming Audio Transformers for Online Audio TaggingInterspeech (Interspeech), 2023

Yujun Wang

Bin Wang

302

29 May 2023

A Survey on Time-Series Pre-Trained ModelsIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

285

18 May 2023

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech RecognitionInterspeech (Interspeech), 2022

Mohan Li

R. Doddipatla

Catalin Zorila

295

24 Apr 2023

A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

178

15 Apr 2023

End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

302

248

03 Mar 2023

A low latency attention module for streaming self-supervised speech representation learning

256

27 Feb 2023

SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal ConvolutionIEEE Signal Processing Letters (SPL), 2022

Fangyuan Wang

Bo Xu

326

21 Nov 2022

Streaming Audio-Visual Speech Recognition with Alignment RegularizationInterspeech (Interspeech), 2022

229

03 Nov 2022

Variable Attention Masking for Configurable Transformer Transducer Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

P. Swietojanski

Stefan Braun

Dogan Can

Thiago Fraga da Silva

...

246

02 Nov 2022

Conversation-oriented ASR with multi-look-ahead CBS architectureIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

247

02 Nov 2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

199

16 Oct 2022

E-Branchformer: Branchformer with Enhanced merging for speech recognitionSpoken Language Technology Workshop (SLT), 2022

Kwangyoun Kim

408

160

30 Sep 2022

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech RecognitionInterspeech (Interspeech), 2022

Martin H. Radfar

Rohit Barnwal

Rupak Vignesh Swaminathan

Feng-Ju Chang

Grant P. Strimel

Nathan Susanj

Athanasios Mouchtaris

232

29 Sep 2022

Improving Speech Emotion Recognition Through Focus and Calibration Attention MechanismsInterspeech (Interspeech), 2022

Junghun Kim

Yoojin An

Jihie Kim

178

21 Aug 2022

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision StrategiesInterspeech (Interspeech), 2022

198

06 Jul 2022