Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.08042
Cited By
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
16 May 2020
Chunyang Wu
Yongqiang Wang
Yangyang Shi
Ching-Feng Yeh
Frank Zhang
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory"
49 / 49 papers shown
CarelessWhisper: Turning Whisper into a Causal Streaming Model
Tomer Krichli
Bhiksha Raj
Joseph Keshet
165
0
0
17 Aug 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Interspeech (Interspeech), 2024
Khanh Le
Duc Thanh Chau
AI4TS
333
2
0
24 Feb 2025
Transducer Consistency Regularization for Speech to Text Applications
Spoken Language Technology Workshop (SLT), 2024
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
337
0
0
09 Oct 2024
FASST: Fast LLM-based Simultaneous Speech Translation
Siqi Ouyang
Xi Xu
Chinmay Dandekar
Lei Li
208
3
0
18 Aug 2024
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
364
1
0
26 Jun 2024
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Hamza Kheddar
Mustapha Hemis
Yassine Himeur
OffRL
302
163
0
02 Mar 2024
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
Chih-Cheng Chang
Li Su
ViT
318
5
0
28 Dec 2023
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
503
26
0
27 Dec 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Interspeech (Interspeech), 2023
Carlos Carvalho
A. Abad
RALM
214
3
0
22 Sep 2023
Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Matthew Raffel
Lizhong Chen
232
5
0
03 Jul 2023
Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation
International Conference on Machine Learning (ICML), 2023
Matthew Raffel
Drew Penney
Lizhong Chen
193
4
0
03 Jul 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
393
17
0
18 Jun 2023
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR
Interspeech (Interspeech), 2023
Goeric Huybrechts
S. Ronanki
Xilai Li
H. Nosrati
S. Bodapati
Katrin Kirchhoff
230
2
0
13 Jun 2023
Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Interspeech (Interspeech), 2023
Hanbyul Kim
S. Seo
Lukas Lee
Seolki Baek
136
3
0
02 Jun 2023
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
Interspeech (Interspeech), 2023
Xingcheng Song
Di Wu
Binbin Zhang
Zhendong Peng
Bo Dang
Fuping Pan
Zhiyong Wu
216
21
0
18 May 2023
Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yun Tang
Anna Y. Sun
Hirofumi Inaguma
Xinyue Chen
Ning Dong
Xutai Ma
Paden Tomasello
J. Pino
310
28
0
04 May 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
502
76
0
21 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
507
365
0
02 Mar 2023
A low latency attention module for streaming self-supervised speech representation learning
Jianbo Ma
Siqi Pan
Deepak Chandran
A. Fanelli
Richard Cartwright
291
0
0
27 Feb 2023
Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction
International Conference on Machine Learning (ICML), 2023
Khai Nguyen
Dang Nguyen
N. Ho
206
9
0
12 Jan 2023
Pushing the performances of ASR models on English and Spanish accents
Pooja Chitkara
M. Rivière
Jade Copet
Frank Zhang
Yatharth Saraf
235
1
0
22 Dec 2022
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
IEEE Signal Processing Letters (SPL), 2022
Fangyuan Wang
Bo Xu
Bo Xu
357
1
0
21 Nov 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
150
2
0
31 Oct 2022
Anchored Speech Recognition with Neural Transducers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Desh Raj
Junteng Jia
Jay Mahadeokar
Chunyang Wu
Niko Moritz
Xiaohui Zhang
Ozlem Kalinli
295
2
0
20 Oct 2022
Real-time Online Video Detection with Temporal Smoothing Transformers
European Conference on Computer Vision (ECCV), 2022
Yue Zhao
Philipp Krahenbuhl
ViT
279
102
0
19 Sep 2022
CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR
Interspeech (Interspeech), 2022
Keyu An
Huahuan Zheng
Zhijian Ou
Hongyu Xiang
Ke Ding
Guanglu Wan
AI4TS
222
23
0
31 Mar 2022
Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
234
0
0
29 Mar 2022
Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR
International Conference on Neural Information Processing (ICONIP), 2022
Fangyuan Wang
Bo Xu
235
5
0
29 Mar 2022
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
237
3
0
15 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
303
19
0
07 Oct 2021
Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning
Songjun Cao
Yueteng Kang
Yanzhe Fu
Xiaoshuo Xu
Sining Sun
Yike Zhang
Long Ma
213
17
0
15 Sep 2021
Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
Niko Moritz
Takaaki Hori
Jonathan Le Roux
181
24
0
02 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
International Workshop on Spoken Language Translation (IWSLT), 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
326
23
0
01 Jul 2021
Collaborative Training of Acoustic Encoders for Speech Recognition
Varun K. Nagaraja
Yangyang Shi
Ganesh Venkatesh
Ozlem Kalinli
M. Seltzer
Vikas Chandra
254
12
0
16 Jun 2021
Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Automatic Speech Recognition & Understanding (ASRU), 2021
Liqiang He
Shulin Feng
Jane Polak Scowcroft
Dong Yu
283
0
0
08 May 2021
Capturing Multi-Resolution Context by Dilated Self-Attention
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Niko Moritz
Takaaki Hori
Jonathan Le Roux
177
8
0
07 Apr 2021
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Interspeech (Interspeech), 2021
Yuan Shangguan
Rohit Prabhavalkar
Hang Su
Jay Mahadeokar
Yangyang Shi
...
Chunyang Wu
Duc Le
Ozlem Kalinli
Christian Fuegen
M. Seltzer
296
34
0
06 Apr 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Interspeech (Interspeech), 2021
Yangyang Shi
Varun K. Nagaraja
Chunyang Wu
Jay Mahadeokar
Duc Le
...
Ching-Feng Yeh
Julian Chan
Christian Fuegen
Ozlem Kalinli
M. Seltzer
285
16
0
05 Apr 2021
Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition
Intelligent Systems with Applications (ISA), 2021
Priyabrata Karmakar
S. Teng
Guojun Lu
167
37
0
14 Feb 2021
Wake Word Detection with Streaming Transformers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yiming Wang
Hang Lv
Daniel Povey
Lei Xie
Sanjeev Khudanpur
AI4TS
199
41
0
08 Feb 2021
Efficient End-to-End Speech Recognition Using Performers in Conformers
Peidong Wang
DeLiang Wang
304
3
0
09 Nov 2020
Alignment Restricted Streaming Recurrent Neural Network Transducer
Jay Mahadeokar
Yuan Shangguan
Duc Le
Gil Keren
Hang Su
Thong Le
Ching-Feng Yeh
Christian Fuegen
M. Seltzer
AI4TS
259
69
0
05 Nov 2020
Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition
Spoken Language Technology Workshop (SLT), 2020
Ching-Feng Yeh
Yongqiang Wang
Yangyang Shi
Chunyang Wu
Frank Zhang
Julian Chan
M. Seltzer
AI4TS
RALM
250
9
0
03 Nov 2020
Streaming Simultaneous Speech Translation with Augmented Memory Transformer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Xutai Ma
Yongqiang Wang
M. Dousti
Philipp Koehn
J. Pino
279
43
0
30 Oct 2020
Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yongqiang Wang
Yangyang Shi
Frank Zhang
Chunyang Wu
Julian Chan
Ching-Feng Yeh
Alex Xiao
341
28
0
27 Oct 2020
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie Chen
Yu-Huan Wu
Zhenghao Wang
Shujie Liu
Jinyu Li
329
205
0
22 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
943
199
0
21 Oct 2020
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
T. Nguyen
S. Stueker
A. Waibel
BDL
408
43
0
07 Oct 2020
Weak-Attention Suppression For Transformer Based Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Christian Fuegen
Frank Zhang
Duc Le
Ching-Feng Yeh
M. Seltzer
295
20
0
18 May 2020
1
Page 1 of 1