Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.12490
Cited By
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs
19 September 2024
Junlin Lv
Yuan Feng
Xike Xie
Xin Jia
Qirong Peng
Guiming Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs"
3 / 3 papers shown
Title
Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective
Yuan Feng
Junlin Lv
Y. Cao
Xike Xie
S.Kevin Zhou
71
2
0
06 Feb 2025
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
20
1
0
10 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
69
1
0
02 Oct 2024
1