Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09360
Cited By
HiRE: High Recall Approximate Top-
k
k
k
Estimation for Efficient LLM Inference
14 February 2024
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference"
4 / 4 papers shown
Title
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
120
114
0
16 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
70
104
0
12 Dec 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
1