Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.05527
Cited By
GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM
8 March 2024
Hao Kang
Qingru Zhang
Souvik Kundu
Geonhwa Jeong
Zaoxing Liu
Tushar Krishna
Tuo Zhao
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM"
12 / 62 papers shown
Title
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities
Yagmur Yigit
M. Ferrag
Iqbal H. Sarker
Leandros A. Maglaras
Christos Chrysoulas
Naghmeh Moradpoor
Helge Janicke
27
5
0
08 May 2024
Efficient LLM Inference with Kcache
Qiaozhi He
Zhihua Wu
RALM
25
1
0
28 Apr 2024
Retrieval Head Mechanistically Explains Long-Context Factuality
Wenhao Wu
Yizhong Wang
Guangxuan Xiao
Hao-Chun Peng
Yao Fu
LRM
30
57
0
24 Apr 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
139
301
0
21 Mar 2024
AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models
Zeyu Liu
Souvik Kundu
Anni Li
Junrui Wan
Lianghao Jiang
P. Beerel
31
9
0
20 Mar 2024
Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference
Muhammad Adnan
Akhil Arunkumar
Gaurav Jain
Prashant J. Nair
Ilya Soloveychik
Purushotham Kamath
22
52
0
14 Mar 2024
The Faiss library
Matthijs Douze
Alexandr Guzhva
Chengqi Deng
Jeff Johnson
Gergely Szilvasy
Pierre-Emmanuel Mazaré
Maria Lomeli
Lucas Hosseini
Hervé Jégou
30
145
0
16 Jan 2024
Transformers are Multi-State RNNs
Matanel Oren
Michael Hassid
Nir Yarden
Yossi Adi
Roy Schwartz
OffRL
19
34
0
11 Jan 2024
SGLang: Efficient Execution of Structured Language Model Programs
Lianmin Zheng
Liangsheng Yin
Zhiqiang Xie
Chuyue Sun
Jeff Huang
...
Christos Kozyrakis
Ion Stoica
Joseph E. Gonzalez
Clark W. Barrett
Ying Sheng
LRM
29
102
0
12 Dec 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Previous
1
2