ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09360
  4. Cited By
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
  Inference

HiRE: High Recall Approximate Top-kkk Estimation for Efficient LLM Inference

14 February 2024
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
ArXivPDFHTML

Papers citing "HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference"

4 / 4 papers shown
Title
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
120
114
0
16 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited
  Memory
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
70
104
0
12 Dec 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
  Models
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
1