Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.10774
Cited By
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
16 June 2024
Jiaming Tang
Yilong Zhao
Kan Zhu
Guangxuan Xiao
Baris Kasikci
Song Han
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference"
3 / 53 papers shown
Title
LLoCO: Learning Long Contexts Offline
Sijun Tan
Xiuyu Li
Shishir G. Patil
Ziyang Wu
Tianjun Zhang
Kurt Keutzer
Joseph E. Gonzalez
Raluca A. Popa
RALM
OffRL
LLMAG
38
6
0
11 Apr 2024
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao
Kaiqi Chen
Kexun Zhang
Jiaxuan You
Binhang Yuan
Zeke Wang
Tao Lin
35
2
0
30 Mar 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
63
20
0
10 Feb 2024
Previous
1
2