ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.02674
  4. Cited By
Streaming automatic speech recognition with the transformer model
v1v2v3v4v5 (latest)

Streaming automatic speech recognition with the transformer model

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
8 January 2020
Niko Moritz
Takaaki Hori
Jonathan Le Roux
ArXiv (abs)PDFHTML

Papers citing "Streaming automatic speech recognition with the transformer model"

50 / 115 papers shown
AdaCoach: A Virtual Coach for Training Customer Service Agents
AdaCoach: A Virtual Coach for Training Customer Service Agents
Shuang Peng
Shuai Zhu
Minghui Yang
Haozhou Huang
Dan Liu
Zujie Wen
Xuelian Li
Biao Fan
184
0
0
27 Apr 2022
Blockwise Streaming Transformer for Spoken Language Understanding and
  Simultaneous Speech Translation
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech TranslationInterspeech (Interspeech), 2022
Keqi Deng
Shinji Watanabe
Jiatong Shi
Siddhant Arora
189
15
0
19 Apr 2022
An Investigation of Monotonic Transducers for Large-Scale Automatic
  Speech Recognition
An Investigation of Monotonic Transducers for Large-Scale Automatic Speech RecognitionSpoken Language Technology Workshop (SLT), 2022
Niko Moritz
Frank Seide
Duc Le
Jay Mahadeokar
Christian Fuegen
387
10
0
19 Apr 2022
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for
  On-Device Speech Recognition
Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech RecognitionInterspeech (Interspeech), 2022
Shaojin Ding
R. Rikhye
Qiao Liang
Yanzhang He
Quan Wang
A. Narayanan
Tom O'Malley
Ian McGraw
210
39
0
08 Apr 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASRInterspeech (Interspeech), 2022
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
183
21
0
31 Mar 2022
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASRInternational Conference on Neural Information Processing (ICONIP), 2022
Fangyuan Wang
Bo Xu
182
5
0
29 Mar 2022
Transformer-based Streaming ASR with Cumulative Attention
Transformer-based Streaming ASR with Cumulative AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Mohan Li
Shucong Zhang
Catalin Zorila
R. Doddipatla
200
11
0
11 Mar 2022
Run-and-back stitch search: novel block synchronous decoding for
  streaming encoder-decoder ASR
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
E. Tsunoo
Chaitanya Narisetty
Michael Hentschel
Yosuke Kashiwagi
Shinji Watanabe
153
3
0
25 Jan 2022
A comparison of streaming models and data augmentation methods for
  robust speech recognition
A comparison of streaming models and data augmentation methods for robust speech recognitionAutomatic Speech Recognition & Understanding (ASRU), 2021
Jiyeon Kim
Mehul Kumar
Dhananjaya N. Gowda
Abhinav Garg
Chanwoo Kim
123
6
0
19 Nov 2021
Solving Probability and Statistics Problems by Program Synthesis
Solving Probability and Statistics Problems by Program Synthesis
Leonard Tang
Elizabeth Ke
Nikhil Singh
Nakul Verma
Iddo Drori
137
15
0
16 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
434
431
0
02 Nov 2021
Sequence Transduction with Graph-based Supervision
Sequence Transduction with Graph-based SupervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
219
7
0
01 Nov 2021
Visualization: the missing factor in Simultaneous Speech Translation
Visualization: the missing factor in Simultaneous Speech TranslationItalian Conference on Computational Linguistics (CLiC-it), 2021
Sara Papi
Matteo Negri
Marco Turchi
216
2
0
31 Oct 2021
An Investigation of Enhancing CTC Model for Triggered Attention-based
  Streaming ASR
An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASRAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021
Huaibo Zhao
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
97
4
0
20 Oct 2021
Study of positional encoding approaches for Audio Spectrogram
  Transformers
Study of positional encoding approaches for Audio Spectrogram Transformers
L. Pepino
Pablo Riera
Luciana Ferrer
ViT
132
7
0
13 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
115
12
0
11 Oct 2021
VideoModerator: A Risk-aware Framework for Multimodal Video Moderation
  in E-Commerce
VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-CommerceIEEE Transactions on Visualization and Computer Graphics (TVCG), 2021
Tan Tang
Yanhong Wu
Lingyun Yu
Yuhong Li
Yingcai Wu
186
30
0
08 Sep 2021
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models
Streaming End-to-End ASR based on Blockwise Non-Autoregressive ModelsInterspeech (Interspeech), 2021
Tianzi Wang
Yuya Fujita
Xuankai Chang
Shinji Watanabe
238
17
0
20 Jul 2021
A Dialogue-based Information Extraction System for Medical Insurance
  Assessment
A Dialogue-based Information Extraction System for Medical Insurance Assessment
Shuang Peng
Mengdi Zhou
Minghui Yang
Haitao Mi
Shaosheng Cao
Zujie Wen
Teng Xu
Hongbin Wang
Lei Liu
139
4
0
13 Jul 2021
Variational Information Bottleneck for Effective Low-resource Audio
  Classification
Variational Information Bottleneck for Effective Low-resource Audio ClassificationInterspeech (Interspeech), 2021
Shijing Si
Jianzong Wang
Huiming Sun
Jianhan Wu
Chuan Zhang
Xiaoyang Qu
Ning Cheng
Lei Chen
Jing Xiao
138
15
0
10 Jul 2021
Relaxed Attention: A Simple Method to Boost Performance of End-to-End
  Automatic Speech Recognition
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Timo Lohrenz
P. Schwarz
Zhengyang Li
Tim Fingscheidt
152
11
0
02 Jul 2021
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech
  Recognition
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Niko Moritz
Takaaki Hori
Jonathan Le Roux
147
23
0
02 Jul 2021
Pay Better Attention to Attention: Head Selection in Multilingual and
  Multi-Domain Sequence Modeling
Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence ModelingNeural Information Processing Systems (NeurIPS), 2021
Hongyu Gong
Yun Tang
J. Pino
Xian Li
213
13
0
21 Jun 2021
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized
  Streaming ASR
Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASRFindings (Findings), 2021
Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
225
36
0
11 Jun 2021
Bridging the gap between streaming and non-streaming ASR systems
  bydistilling ensembles of CTC and RNN-T models
Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T modelsInterspeech (Interspeech), 2021
Thibault Doutre
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Olivier Siohan
Liangliang Cao
154
6
0
25 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded TransformersInterspeech (Interspeech), 2021
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
145
37
0
19 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
646
442
0
17 Apr 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
Capturing Multi-Resolution Context by Dilated Self-AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Jonathan Le Roux
144
8
0
07 Apr 2021
Extremely Low Footprint End-to-End ASR System for Smart Device
Extremely Low Footprint End-to-End ASR System for Smart DeviceInterspeech (Interspeech), 2021
Zhifu Gao
Yiwu Yao
Shiliang Zhang
Jun Yang
Ming Lei
Ian Mcloughlin
118
15
0
06 Apr 2021
Mutually-Constrained Monotonic Multihead Attention for Online ASR
Mutually-Constrained Monotonic Multihead Attention for Online ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Jae-gyun Song
Hajin Shim
Eunho Yang
103
0
0
26 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
  with Self-Knowledge Distillation
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge DistillationInterspeech (Interspeech), 2021
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
193
6
0
17 Mar 2021
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative
  Adversarial Networks
Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial NetworksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Md. Akmal Haidar
Mehdi Rezagholizadeh
267
9
0
10 Mar 2021
Alignment Knowledge Distillation for Online Streaming Attention-based
  Speech Recognition
Alignment Knowledge Distillation for Online Streaming Attention-based Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Hirofumi Inaguma
Tatsuya Kawahara
376
19
0
28 Feb 2021
Thank you for Attention: A survey on Attention-based Artificial Neural
  Networks for Automatic Speech Recognition
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech RecognitionIntelligent Systems with Applications (ISA), 2021
Priyabrata Karmakar
S. Teng
Guojun Lu
143
37
0
14 Feb 2021
Motion-Based Handwriting Recognition and Word Reconstruction
Motion-Based Handwriting Recognition and Word Reconstruction
Junshen Kevin Chen
Wanze Xie
Yutong He
183
1
0
15 Jan 2021
Fast offline Transformer-based end-to-end automatic speech recognition
  for real-world applications
Fast offline Transformer-based end-to-end automatic speech recognition for real-world applicationsETRI Journal (ETRI J.), 2021
Y. Oh
Kiyoung Park
Jeongue Park
OffRL
320
6
0
14 Jan 2021
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
108
5
0
17 Nov 2020
Block-Online Guided Source Separation
Block-Online Guided Source SeparationSpoken Language Technology Workshop (SLT), 2020
Shota Horiguchi
Yusuke Fujita
Kenji Nagamatsu
150
4
0
16 Nov 2020
Dynamic latency speech recognition with asynchronous revision
Dynamic latency speech recognition with asynchronous revision
Mingkun Huang
Meng Cai
Jun Zhang
Yang Zhang
Yongbin You
Yi He
Zejun Ma
BDL
163
3
0
03 Nov 2020
Semi-Supervised Speech Recognition via Graph-based Temporal
  Classification
Semi-Supervised Speech Recognition via Graph-based Temporal ClassificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Niko Moritz
Takaaki Hori
Jonathan Le Roux
287
30
0
29 Oct 2020
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer
  for Speech Recognition
CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Ruchao Fan
Wei Chu
Peng Chang
Jing Xiao
156
42
0
28 Oct 2020
Transformer in action: a comparative study of transformer-based acoustic
  models for large scale speech recognition applications
Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applicationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yongqiang Wang
Yangyang Shi
Frank Zhang
Chunyang Wu
Julian Chan
Ching-Feng Yeh
Alex Xiao
290
28
0
27 Oct 2020
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single
  Encoder-Decoder Model
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model
Zhifu Gao
Shiliang Zhang
Ming Lei
Ian Mcloughlin
CVBM
130
17
0
27 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense
  Synthesizer Attention
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
261
36
0
23 Oct 2020
Improving Streaming Automatic Speech Recognition With Non-Streaming
  Model Distillation On Unsupervised Data
Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
Thibault Doutre
Wei Han
Min Ma
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
A. Narayanan
Ananya Misra
Yu Zhang
Liangliang Cao
299
24
0
22 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech
  Recognition on Large-scale Dataset
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
284
200
0
22 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low
  Latency Streaming Speech Recognition
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
798
190
0
21 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
  Modeling
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Yue Liu
Tara N. Sainath
Yonghui Wu
Ruoming Pang
374
19
0
12 Oct 2020
Super-Human Performance in Online Low-latency Recognition of
  Conversational Speech
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
340
43
0
07 Oct 2020
Large-scale Transfer Learning for Low-resource Spoken Language
  Understanding
Large-scale Transfer Learning for Low-resource Spoken Language UnderstandingInterspeech (Interspeech), 2020
X. Jia
Jianzong Wang
Zhiyong Zhang
Ning Cheng
Jing Xiao
167
17
0
13 Aug 2020
Previous
123
Next
Page 2 of 3