Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.10285
Cited By
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
19 September 2023
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
S. Song
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity"
6 / 6 papers shown
Title
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
366
0
13 Mar 2023
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
72
31
0
14 Sep 2022
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Zihao Ye
Ruihang Lai
Junru Shao
Tianqi Chen
Luis Ceze
76
91
0
11 Jul 2022
Can Foundation Models Wrangle Your Data?
A. Narayan
Ines Chami
Laurel J. Orr
Simran Arora
Christopher Ré
LMTD
AI4CE
176
213
0
20 May 2022
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
139
684
0
31 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
1