ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.10774
  4. Cited By
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

16 June 2024
Jiaming Tang
Yilong Zhao
Kan Zhu
Guangxuan Xiao
Baris Kasikci
Song Han
ArXivPDFHTML

Papers citing "Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference"

3 / 53 papers shown
Title
LLoCO: Learning Long Contexts Offline
LLoCO: Learning Long Contexts Offline
Sijun Tan
Xiuyu Li
Shishir G. Patil
Ziyang Wu
Tianjun Zhang
Kurt Keutzer
Joseph E. Gonzalez
Raluca A. Popa
RALM
OffRL
LLMAG
38
6
0
11 Apr 2024
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao
Kaiqi Chen
Kexun Zhang
Jiaxuan You
Binhang Yuan
Zeke Wang
Tao Lin
35
2
0
30 Mar 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
63
20
0
10 Feb 2024
Previous
12