ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09852
  4. Cited By
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

17 December 2020
Hanrui Wang
Zhekai Zhang
Song Han
ArXivPDFHTML

Papers citing "SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning"

50 / 160 papers shown
Title
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution
  Vision Transformer
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Xuanyao Chen
Zhijian Liu
Haotian Tang
Li Yi
Hang Zhao
Song Han
ViT
21
46
0
30 Mar 2023
TransCODE: Co-design of Transformers and Accelerators for Efficient
  Training and Inference
TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference
Shikhar Tuli
N. Jha
30
5
0
27 Mar 2023
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile
  Edge Platforms
EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms
Shikhar Tuli
N. Jha
34
3
0
24 Mar 2023
X-Former: In-Memory Acceleration of Transformers
X-Former: In-Memory Acceleration of Transformers
S. Sridharan
Jacob R. Stevens
Kaushik Roy
A. Raghunathan
GNN
18
33
0
13 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
24
20
0
07 Mar 2023
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with
  Transformers
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
Shikhar Tuli
N. Jha
25
31
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
28
100
0
27 Feb 2023
Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the
  Edge
Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge
Jingzong Li
Yik Hong Cai
Libin Liu
Yushun Mao
Chun Jason Xue
Hongchang Xu
15
3
0
18 Feb 2023
What Matters In The Structured Pruning of Generative Language Models?
What Matters In The Structured Pruning of Generative Language Models?
Michael Santacroce
Zixin Wen
Yelong Shen
Yuan-Fang Li
18
32
0
07 Feb 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation
  Invariant Transformation
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
32
27
0
26 Jan 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
19
3
0
23 Jan 2023
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and
  Training Efficiency via Efficient Data Sampling and Routing
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
19
24
0
07 Dec 2022
Signed Binary Weight Networks
Sachit Kuhar
Alexey Tumanov
Judy Hoffman
MQ
11
1
0
25 Nov 2022
Peeling the Onion: Hierarchical Reduction of Data Redundancy for
  Efficient Vision Transformer Training
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong
Haoyu Ma
Geng Yuan
Mengshu Sun
Yanyue Xie
...
Tianlong Chen
Xiaolong Ma
Xiaohui Xie
Zhangyang Wang
Yanzhi Wang
ViT
26
22
0
19 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
61
731
0
18 Nov 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient
  Training for Large-scale Transformers
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers
Z. Yao
Xiaoxia Wu
Conglong Li
Connor Holmes
Minjia Zhang
Cheng-rong Li
Yuxiong He
22
11
0
17 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision
  Transformer Acceleration with a Linear Taylor Attention
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
17
49
0
09 Nov 2022
QuEst: Graph Transformer for Quantum Circuit Reliability Estimation
QuEst: Graph Transformer for Quantum Circuit Reliability Estimation
Hanrui Wang
Pengyu Liu
Jinglei Cheng
Zhiding Liang
Jiaqi Gu
...
Yiyu Shi
Xuehai Qian
D. Pan
Frederic T. Chong
Song Han
29
31
0
30 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Katie Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
17
76
0
18 Oct 2022
Demystifying Map Space Exploration for NPUs
Demystifying Map Space Exploration for NPUs
Sheng-Chun Kao
A. Parashar
Po-An Tsai
T. Krishna
30
11
0
07 Oct 2022
Small Character Models Match Large Word Models for Autocomplete Under
  Memory Constraints
Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints
Ganesh Jawahar
Subhabrata Mukherjee
Debadeepta Dey
Muhammad Abdul-Mageed
L. Lakshmanan
C. C. T. Mendes
Gustavo de Rosa
S. Shah
27
0
0
06 Oct 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating
  Transformer-based Text Generation
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
64
76
0
22 Sep 2022
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
  Algorithm Co-design
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
Hongxiang Fan
Thomas C. P. Chau
Stylianos I. Venieris
Royson Lee
Alexandros Kouris
Wayne Luk
Nicholas D. Lane
Mohamed S. Abdelfattah
32
56
0
20 Sep 2022
Sparse Attention Acceleration with Synergistic In-Memory Pruning and
  On-Chip Recomputation
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Amir Yazdanbakhsh
Ashkan Moradifirouzabadi
Zheng Li
Mingu Kang
19
31
0
01 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
16
54
0
30 Aug 2022
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse
  Transformers
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers
Chao Fang
Aojun Zhou
Zhongfeng Wang
MoE
25
53
0
12 Aug 2022
A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA
  Through Sparse Attention and Dynamic Pipelining
A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining
Hongwu Peng
Shaoyi Huang
Shiyang Chen
Bingbing Li
Tong Geng
...
Weiwen Jiang
Wujie Wen
J. Bi
Hang Liu
Caiwen Ding
45
54
0
07 Aug 2022
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention
  Mechanisms for Long Sequences
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Guan Shen
Jieru Zhao
Quan Chen
Jingwen Leng
C. Li
Minyi Guo
39
26
0
29 Jun 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
12
6
0
22 Jun 2022
Transkimmer: Transformer Learns to Layer-wise Skim
Transkimmer: Transformer Learns to Layer-wise Skim
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
70
38
0
15 May 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and
  Applications
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Yujun Lin
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
19
107
0
25 Apr 2022
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Accelerating Attention through Gradient-Based Learned Runtime Pruning
Zheng Li
Soroush Ghodrati
Amir Yazdanbakhsh
H. Esmaeilzadeh
Mingu Kang
19
16
0
07 Apr 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
15
143
0
29 Mar 2022
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box
  Floating-Point Transformer Models
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Ali Hadi Zadeh
Mostafa Mahmoud
Ameer Abdelhadi
Andreas Moshovos
MQ
13
31
0
23 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
27
103
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
26
5
0
20 Mar 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
28
22
0
28 Feb 2022
QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning
QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning
Hanrui Wang
Zi-Chen Li
Jiaqi Gu
Yongshan Ding
D. Pan
Song Han
32
52
0
26 Feb 2022
Multi-Dimensional Model Compression of Vision Transformer
Multi-Dimensional Model Compression of Vision Transformer
Zejiang Hou
S. Kung
ViT
20
16
0
31 Dec 2021
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
26
141
0
27 Dec 2021
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient
  Transformer Inference
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Joonsang Yu
Junki Park
Seongmin Park
Minsoo Kim
Sihwa Lee
Dong Hyun Lee
Jungwook Choi
27
48
0
03 Dec 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul M. Chilimbi
10
18
0
30 Oct 2021
QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization
QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization
Hanrui Wang
Jiaqi Gu
Yongshan Ding
Zi-Chen Li
Frederic T. Chong
D. Pan
Song Han
22
63
0
21 Oct 2021
Transformer Acceleration with Dynamic Sparse Attention
Transformer Acceleration with Dynamic Sparse Attention
Liu Liu
Zheng Qu
Zhaodong Chen
Yufei Ding
Yuan Xie
19
20
0
21 Oct 2021
Energon: Towards Efficient Acceleration of Transformers Using Dynamic
  Sparse Attention
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Zhe Zhou
Junling Liu
Zhenyu Gu
Guangyu Sun
56
42
0
18 Oct 2021
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with
  Variable Hidden Dimensions
SuperShaper: Task-Agnostic Super Pre-training of BERT Models with Variable Hidden Dimensions
Vinod Ganesan
Gowtham Ramesh
Pratyush Kumar
31
9
0
10 Oct 2021
Layer-wise Pruning of Transformer Attention Heads for Efficient Language
  Modeling
Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling
Kyuhong Shim
Iksoo Choi
Wonyong Sung
Jungwook Choi
21
15
0
07 Oct 2021
Layer-wise Model Pruning based on Mutual Information
Layer-wise Model Pruning based on Mutual Information
Chun Fan
Jiwei Li
Xiang Ao
Fei Wu
Yuxian Meng
Xiaofei Sun
38
19
0
28 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
19
3
0
03 Aug 2021
Previous
1234
Next