Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

16 May 2020

Papers citing "Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory"

49 / 49 papers shown

CarelessWhisper: Turning Whisper into a Causal Streaming Model

Tomer Krichli

Bhiksha Raj

Joseph Keshet

165

17 Aug 2025

Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context MaskingInterspeech (Interspeech), 2024

Khanh Le

Duc Thanh Chau

AI4TS

333

24 Feb 2025

Transducer Consistency Regularization for Speech to Text ApplicationsSpoken Language Technology Workshop (SLT), 2024

Cindy Tseng

Yun Tang

Vijendra Raj Apsingekar

337

09 Oct 2024

FASST: Fast LLM-based Simultaneous Speech Translation

208

18 Aug 2024

Token-Weighted RNN-T for Learning from Flawed Data

Gil Keren

Wei Zhou

Ozlem Kalinli

364

26 Jun 2024

Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Hamza Kheddar

Mustapha Hemis

Yassine Himeur

OffRL

302

163

02 Mar 2024

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer

Chih-Cheng Chang

Li Su

ViT

318

28 Dec 2023

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Vahid Noroozi

Somshubra Majumdar

Ankur Kumar

Jagadeesh Balam

Boris Ginsburg

503

27 Dec 2023

Memory-augmented conformer for improved end-to-end long-form ASRInterspeech (Interspeech), 2023

Carlos Carvalho

A. Abad

RALM

214

22 Sep 2023

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Matthew Raffel

Lizhong Chen

232

03 Jul 2023

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech TranslationInternational Conference on Machine Learning (ICML), 2023

Matthew Raffel

Drew Penney

Lizhong Chen

193

03 Jul 2023

SURT 2.0: Advances in Transducer-based Multi-talker Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Desh Raj

Daniel Povey

Sanjeev Khudanpur

VLM

393

18 Jun 2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASRInterspeech (Interspeech), 2023

230

13 Jun 2023

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with PunctuationInterspeech (Interspeech), 2023

136

02 Jun 2023

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMsInterspeech (Interspeech), 2023

Binbin Zhang

Zhiyong Wu

216

18 May 2023

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

310

04 May 2023

Transformers in Speech Processing: A Survey

502

21 Mar 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

...

507

365

02 Mar 2023

A low latency attention module for streaming self-supervised speech representation learning

291

27 Feb 2023

Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud ReconstructionInternational Conference on Machine Learning (ICML), 2023

Khai Nguyen

Dang Nguyen

N. Ho

206

12 Jan 2023

Pushing the performances of ASR models on English and Spanish accents

235

22 Dec 2022

SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal ConvolutionIEEE Signal Processing Letters (SPL), 2022

Fangyuan Wang

Bo Xu

357

21 Nov 2022

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

Xingcheng Song

Di Wu

Binbin Zhang

Zhiyong Wu

...

150

31 Oct 2022

Anchored Speech Recognition with Neural TransducersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Ozlem Kalinli

295

20 Oct 2022

Real-time Online Video Detection with Temporal Smoothing TransformersEuropean Conference on Computer Vision (ECCV), 2022

Yue Zhao

Philipp Krahenbuhl

ViT

279

102

19 Sep 2022

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASRInterspeech (Interspeech), 2022

Zhijian Ou

222

31 Mar 2022

Dynamic Latency for CTC-Based Streaming Automatic Speech Recognition With Emformer

234

29 Mar 2022

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASRInternational Conference on Neural Information Processing (ICONIP), 2022

Fangyuan Wang

Bo Xu

235

29 Mar 2022

StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data

237

15 Oct 2021

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

...

Ozlem Kalinli

303

07 Oct 2021

Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning

213

15 Sep 2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Niko Moritz

Takaaki Hori

Jonathan Le Roux

181

02 Jul 2021

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021International Workshop on Spoken Language Translation (IWSLT), 2021

Yuchen Hu

326

01 Jul 2021

Collaborative Training of Acoustic Encoders for Speech Recognition

Ozlem Kalinli

254

16 Jun 2021

Latency-Controlled Neural Architecture Search for Streaming Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2021

283

08 May 2021

Capturing Multi-Resolution Context by Dilated Self-AttentionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Niko Moritz

Takaaki Hori

Jonathan Le Roux

177

07 Apr 2021

Dissecting User-Perceived Latency of On-Device E2E Speech RecognitionInterspeech (Interspeech), 2021

...

Ozlem Kalinli

296

06 Apr 2021

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For LatencyInterspeech (Interspeech), 2021

...

Ozlem Kalinli

285

05 Apr 2021

Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech RecognitionIntelligent Systems with Applications (ISA), 2021

Priyabrata Karmakar

S. Teng

Guojun Lu

167

14 Feb 2021

Wake Word Detection with Streaming TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Yiming Wang

Hang Lv

Daniel Povey

Lei Xie

Sanjeev Khudanpur

AI4TS

199

08 Feb 2021

Efficient End-to-End Speech Recognition Using Performers in Conformers

Peidong Wang

DeLiang Wang

304

09 Nov 2020

Alignment Restricted Streaming Recurrent Neural Network Transducer

259

05 Nov 2020

Streaming Attention-Based Models with Augmented Memory for End-to-End Speech RecognitionSpoken Language Technology Workshop (SLT), 2020

250

03 Nov 2020

Streaming Simultaneous Speech Translation with Augmented Memory TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

279

30 Oct 2020

Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applicationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

341

27 Oct 2020

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

Xie Chen

329

205

22 Oct 2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

943

199

21 Oct 2020

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

408

07 Oct 2020

Weak-Attention Suppression For Transformer Based Speech Recognition

295

18 May 2020