SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention
Mechanisms for Long Sequences

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

29 June 2022

Jieru Zhao

Jingwen Leng

Papers citing "SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences"

8 / 8 papers shown

Title
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs Ehsan Kabir Md. Arafat Kabir Austin R. J. Downey Jason D. Bakos David Andrews Miaoqing Huang GNN 16 0 0 21 Sep 2024
Hardware Acceleration of LLMs: A comprehensive survey and comparison Nikoletta Koilia C. Kachris 45 5 0 05 Sep 2024
SWAT: Scalable and Efficient Window Attention-based Transformers Acceleration on FPGAs Zhenyu Bai Pranav Dangi Huize Li Tulika Mitra 16 5 0 27 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving Pai Zeng Zhenyu Ning Jieru Zhao Weihao Cui Mengwei Xu Liwei Guo Xusheng Chen Yizhou Shan LLMAG 35 4 0 18 May 2024
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs Cyrus Zhou Zack Hassman Ruize Xu Dhirpal Shah Vaughn Richard Yanjing Li 14 1 0 01 Oct 2023
A Survey of Techniques for Optimizing Transformer Inference Krishna Teja Chitty-Venkata Sparsh Mittal M. Emani V. Vishwanath Arun Somani 22 60 0 16 Jul 2023
Full Stack Optimization of Transformer Inference: a Survey Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan ... Qijing Huang Kurt Keutzer Michael W. Mahoney Y. Shao A. Gholami MQ 13 100 0 27 Feb 2023
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention Jyotikrishna Dass Shang Wu Huihong Shi Chaojian Li Zhifan Ye Zhongfeng Wang Yingyan Lin 10 49 0 09 Nov 2022