Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.06118
Cited By
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers
12 August 2022
Chao Fang
Aojun Zhou
Zhongfeng Wang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers"
6 / 6 papers shown
Title
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
77
76
0
07 May 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
27
25
0
22 Feb 2024
Bi-directional Masks for Efficient N:M Sparse Training
Yu-xin Zhang
Yiting Luo
Mingbao Lin
Yunshan Zhong
Jingjing Xie
Fei Chao
Rongrong Ji
32
15
0
13 Feb 2023
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
303
5,761
0
29 Apr 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
139
684
0
31 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1