ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.02674
  4. Cited By
Streaming automatic speech recognition with the transformer model
v1v2v3v4v5 (latest)

Streaming automatic speech recognition with the transformer model

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
8 January 2020
Niko Moritz
Takaaki Hori
Jonathan Le Roux
ArXiv (abs)PDFHTML

Papers citing "Streaming automatic speech recognition with the transformer model"

50 / 115 papers shown
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
Manolis Mylonas
Charalampia Zerva
Evlampios Apostolidis
Vasileios Mezaris
131
3
0
07 Oct 2025
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
108
0
0
01 Oct 2025
CarelessWhisper: Turning Whisper into a Causal Streaming Model
CarelessWhisper: Turning Whisper into a Causal Streaming Model
Tomer Krichli
Bhiksha Raj
Joseph Keshet
75
0
0
17 Aug 2025
Conformer-based Ultrasound-to-Speech Conversion
Conformer-based Ultrasound-to-Speech Conversion
Ibrahim Ibrahimov
Zainkó Csaba
Gábor Gosztolya
MedIm
177
0
0
04 Jun 2025
A 71.2-$μ$W Speech Recognition Accelerator with Recurrent Spiking Neural Network
A 71.2-μμμW Speech Recognition Accelerator with Recurrent Spiking Neural NetworkIEEE Transactions on Circuits and Systems Part 1: Regular Papers (TCAS-I), 2024
Chih-Chyau Yang
Tian-Sheuan Chang
376
2
0
27 Mar 2025
ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement
ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Haoxu Wang
Biao Tian
134
5
0
10 Jan 2025
Large Language Models Are Read/Write Policy-Makers for Simultaneous GenerationAAAI Conference on Artificial Intelligence (AAAI), 2025
Shoutao Guo
Shaolei Zhang
Zhengrui Ma
Yang Feng
223
3
0
03 Jan 2025
The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge
The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge
Ya Jiang
Hongbo Lan
Jun Du
Qing Wang
Shutong Niu
311
1
0
08 Oct 2024
Mamba for Streaming ASR Combined with Unimodal Aggregation
Mamba for Streaming ASR Combined with Unimodal AggregationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ying Fang
Xiaofei Li
Mamba
228
9
0
30 Sep 2024
Predictive Speech Recognition and End-of-Utterance Detection Towards
  Spoken Dialog Systems
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Oswald Zink
Yosuke Higuchi
Carlos Mullov
Alexander Waibel
Tetsunori Kobayashi
145
3
0
30 Sep 2024
Learning from Demonstration with Implicit Nonlinear Dynamics Models
Learning from Demonstration with Implicit Nonlinear Dynamics Models
Peter David Fagan
Subramanian Ramamoorthy
934
0
0
27 Sep 2024
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
Ming-Hao Hsu
Kuan Po Huang
371
3
0
16 Sep 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech
  Recognition with Multi-Decoder and Knowledge Distillation
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
Muhammad Shakeel
Yui Sudo
Yifan Peng
Shinji Watanabe
250
0
0
22 May 2024
Multichannel Long-Term Streaming Neural Speech Enhancement for Static
  and Moving Speakers
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving SpeakersIEEE Signal Processing Letters (SPL), 2024
Changsheng Quan
Xiaofei Li
347
44
0
12 Mar 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
523
2
0
02 Feb 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
454
22
0
27 Dec 2023
Revisiting the Entropy Semiring for Neural Speech Recognition
Revisiting the Entropy Semiring for Neural Speech RecognitionInternational Conference on Learning Representations (ICLR), 2023
Oscar Chang
DongSeon Hwang
Olivier Siohan
372
3
0
13 Dec 2023
Unified Segment-to-Segment Framework for Simultaneous Sequence
  Generation
Unified Segment-to-Segment Framework for Simultaneous Sequence GenerationNeural Information Processing Systems (NeurIPS), 2023
Shaolei Zhang
Yang Feng
278
9
0
27 Oct 2023
Incremental Blockwise Beam Search for Simultaneous Speech Translation
  with Controllable Quality-Latency Tradeoff
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency TradeoffInterspeech (Interspeech), 2023
Peter Polák
Brian Yan
Shinji Watanabe
A. Waibel
Ondrej Bojar
159
10
0
20 Sep 2023
Semi-Autoregressive Streaming ASR With Label Context
Semi-Autoregressive Streaming ASR With Label ContextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Siddhant Arora
G. Saon
Shinji Watanabe
Brian Kingsbury
AI4TS
225
10
0
19 Sep 2023
Folding Attention: Memory and Power Optimization for On-Device
  Transformer-based Streaming Speech Recognition
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yang Li
Liangzhen Lai
Shangguan Yuan
Forrest N. Iandola
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
181
9
0
14 Sep 2023
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual CorpusIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
265
20
0
11 Sep 2023
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech
  Recognition
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech RecognitionEuropean Signal Processing Conference (EUSIPCO), 2023
Huaibo Zhao
Yosuke Higuchi
Yusuke Kida
Tetsuji Ogawa
Tetsunori Kobayashi
205
1
0
09 Sep 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Radio2Text: Streaming Speech Recognition Using mmWave Radio SignalsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023
Running Zhao
Jiang-Tao Luca Yu
Haiying Zhao
Edith C.H. Ngai
248
10
0
16 Aug 2023
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and
  Effective Hotword Customization Ability
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization AbilityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xian Shi
Yexin Yang
Zerui Li
Yanni Chen
Zhifu Gao
Shiliang Zhang
277
21
0
07 Aug 2023
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated
  Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
ApproBiVT: Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff Guided Early Stopping and Checkpoint Averaging
Fangyuan Wang
Ming Hao
Yuhai Shi
Bo Xu
MoMe
161
0
0
05 Aug 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
TST: Time-Sparse Transducer for Automatic Speech RecognitionCAAI International Conference on Artificial Intelligence (ICCAI), 2023
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
Jianhua Tao
122
0
0
17 Jul 2023
BASS: Block-wise Adaptation for Speech Summarization
BASS: Block-wise Adaptation for Speech SummarizationInterspeech (Interspeech), 2023
Roshan S. Sharma
Kenneth Zheng
Siddhant Arora
Shinji Watanabe
Rita Singh
Bhiksha Raj
174
8
0
17 Jul 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST
  Leveraging Textual Alignments
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual AlignmentsAutomatic Speech Recognition & Understanding (ASRU), 2023
Sara Papi
Peidong Wan
Junkun Chen
Jian Xue
Jinyu Li
Yashesh Gaur
340
8
0
07 Jul 2023
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice
  Conversion
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice ConversionInterspeech (Interspeech), 2023
Zhe Ye
Terui Mao
Li Dong
Diqun Yan
AAML
153
15
0
28 Jun 2023
Advancing Adversarial Training by Injecting Booster Signal
Advancing Adversarial Training by Injecting Booster SignalIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Hong Joo Lee
Youngjoon Yu
Yonghyun Ro
AAML
289
4
0
27 Jun 2023
Streaming Speech-to-Confusion Network Speech Recognition
Streaming Speech-to-Confusion Network Speech RecognitionInterspeech (Interspeech), 2023
Denis Filimonov
Prabhat Pandey
Ariya Rastrow
Ankur Gandhe
A. Stolcke
HAI
206
0
0
02 Jun 2023
Improved Training for End-to-End Streaming Automatic Speech Recognition
  Model with Punctuation
Improved Training for End-to-End Streaming Automatic Speech Recognition Model with PunctuationInterspeech (Interspeech), 2023
Hanbyul Kim
S. Seo
Lukas Lee
Seolki Baek
119
3
0
02 Jun 2023
Quantization-Aware and Tensor-Compressed Training of Transformers for
  Natural Language Understanding
Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language UnderstandingInterspeech (Interspeech), 2023
Ziao Yang
Samridhi Choudhary
Siegfried Kunzmann
Zheng Zhang
MQ
282
5
0
01 Jun 2023
Enhancing the Unified Streaming and Non-streaming Model with Contrastive
  Learning
Enhancing the Unified Streaming and Non-streaming Model with Contrastive LearningInterspeech (Interspeech), 2023
Yuting Yang
Yuke Li
Binbin Du
AI4TS
168
1
0
01 Jun 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio TaggingInterspeech (Interspeech), 2023
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
302
4
0
29 May 2023
A Survey on Time-Series Pre-Trained Models
A Survey on Time-Series Pre-Trained ModelsIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023
Qianli Ma
Ziqiang Liu
Zhenjing Zheng
Ziyang Huang
Siying Zhu
Zhongzhong Yu
James T. Kwok
AI4TS
285
89
0
18 May 2023
Self-regularised Minimum Latency Training for Streaming
  Transformer-based Speech Recognition
Self-regularised Minimum Latency Training for Streaming Transformer-based Speech RecognitionInterspeech (Interspeech), 2022
Mohan Li
R. Doddipatla
Catalin Zorila
295
0
0
24 Apr 2023
A CTC Alignment-based Non-autoregressive Transformer for End-to-end
  Automatic Speech Recognition
A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Ruchao Fan
Wei Chu
Peng Chang
Abeer Alwan
178
18
0
15 Apr 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
302
248
0
03 Mar 2023
A low latency attention module for streaming self-supervised speech
  representation learning
A low latency attention module for streaming self-supervised speech representation learning
Jianbo Ma
Siqi Pan
Deepak Chandran
A. Fanelli
Richard Cartwright
256
0
0
27 Feb 2023
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR
  Using Sequentially Sampled Chunks and Chunked Causal Convolution
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal ConvolutionIEEE Signal Processing Letters (SPL), 2022
Fangyuan Wang
Bo Xu
Bo Xu
326
0
0
21 Nov 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Streaming Audio-Visual Speech Recognition with Alignment RegularizationInterspeech (Interspeech), 2022
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
Maja Pantic
229
2
0
03 Nov 2022
Variable Attention Masking for Configurable Transformer Transducer
  Speech Recognition
Variable Attention Masking for Configurable Transformer Transducer Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
P. Swietojanski
Stefan Braun
Dogan Can
Thiago Fraga da Silva
Arnab Ghoshal
...
Henry Mason
Erik McDermott
Honza Silovsky
R. Travadi
Xiaodan Zhuang
246
21
0
02 Nov 2022
Conversation-oriented ASR with multi-look-ahead CBS architecture
Conversation-oriented ASR with multi-look-ahead CBS architectureIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Huaibo Zhao
S. Fujie
Tetsuji Ogawa
Jin Sakuma
Yusuke Kida
Tetsunori Kobayashi
247
3
0
02 Nov 2022
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample
  Decoding
Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan
Guoli Ye
Yashesh Gaur
Jinyu Li
199
4
0
16 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognitionSpoken Language Technology Workshop (SLT), 2022
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
408
160
0
30 Sep 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech RecognitionInterspeech (Interspeech), 2022
Martin H. Radfar
Rohit Barnwal
Rupak Vignesh Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
232
14
0
29 Sep 2022
Improving Speech Emotion Recognition Through Focus and Calibration
  Attention Mechanisms
Improving Speech Emotion Recognition Through Focus and Calibration Attention MechanismsInterspeech (Interspeech), 2022
Junghun Kim
Yoojin An
Jihie Kim
178
14
0
21 Aug 2022
Improving Streaming End-to-End ASR on Transformer-based Causal Models
  with Encoder States Revision Strategies
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision StrategiesInterspeech (Interspeech), 2022
Zehan Li
Haoran Miao
Keqi Deng
Gaofeng Cheng
Sanli Tian
Ta Li
Yonghong Yan
KELM
198
5
0
06 Jul 2022
123
Next
Page 1 of 3