POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference

23 October 2024

Papers citing "POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference"

2 / 2 papers shown

Title
Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents Yueying Li Jim Dai Tianyi Peng 38 1 0 10 Apr 2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving Zihao Ye Lequn Chen Ruihang Lai Wuwei Lin Yineng Zhang ... Tianqi Chen Baris Kasikci Vinod Grover Arvind Krishnamurthy Luis Ceze 56 19 0 02 Jan 2025