Folding Attention: Memory and Power Optimization for On-Device
Transformer-based Streaming Speech Recognition

v1v2v3 (latest)

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

14 September 2023

Forrest N. Iandola

ArXiv (abs)PDF HTML

Papers citing "Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition"

4 / 4 papers shown

Title
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting Tuan Vu Ho Hiroaki Kokubo Masaaki Yamamoto Yohei Kawaguchi 76 0 0 29 Jul 2025
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment Aditya Chakravarty 155 2 0 02 May 2024
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 556 625 0 06 Nov 2019
Xception: Deep Learning with Depthwise Separable ConvolutionsComputer Vision and Pattern Recognition (CVPR), 2016 François Chollet MDE BDL PINN 2.8K 16,539 0 07 Oct 2016