Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.14550
Cited By
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
29 June 2022
Guan Shen
Jieru Zhao
Quan Chen
Jingwen Leng
C. Li
Minyi Guo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences"
8 / 8 papers shown
Title
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
Ehsan Kabir
Md. Arafat Kabir
Austin R. J. Downey
Jason D. Bakos
David Andrews
Miaoqing Huang
GNN
18
0
0
21 Sep 2024
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Nikoletta Koilia
C. Kachris
45
5
0
05 Sep 2024
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs
Zhenyu Bai
Pranav Dangi
Huize Li
Tulika Mitra
16
5
0
27 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
35
4
0
18 May 2024
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
14
1
0
01 Oct 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
22
60
0
16 Jul 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
15
100
0
27 Feb 2023
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
10
49
0
09 Nov 2022
1