ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.09852
  4. Cited By
SpAtten: Efficient Sparse Attention Architecture with Cascade Token and
  Head Pruning

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

17 December 2020
Hanrui Wang
Zhekai Zhang
Song Han
ArXivPDFHTML

Papers citing "SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning"

10 / 160 papers shown
Title
QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
Hanrui Wang
Yongshan Ding
Jiaqi Gu
Zirui Li
Yujun Lin
D. Pan
Frederic T. Chong
Song Han
14
170
0
22 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
30
57
0
13 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
9
145
0
02 Jul 2021
LEAP: Learnable Pruning for Transformer-based Models
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
20
7
0
30 May 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
19
131
0
01 Apr 2021
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
  Multi-Task NLP Inference
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
Thierry Tambe
Coleman Hooper
Lillian Pentecost
Tianyu Jia
En-Yu Yang
...
Victor Sanh
P. Whatmough
Alexander M. Rush
David Brooks
Gu-Yeon Wei
4
117
0
28 Nov 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
29
92
0
14 Oct 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
  Models: A Survey and Insights
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
46
81
0
02 Jul 2020
RecNMP: Accelerating Personalized Recommendation with Near-Memory
  Processing
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Liu Ke
Udit Gupta
Carole-Jean Wu
B. Cho
Mark Hempstead
...
Dheevatsa Mudigere
Maxim Naumov
Martin D. Schatz
M. Smelyanskiy
Xiaodong Wang
41
212
0
30 Dec 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
Previous
1234