Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

25 September 2024

Papers citing "Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations"

1 / 1 papers shown

Title
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Jian Chen Vashisth Tiwari Ranajoy Sadhukhan Zhuoming Chen Jinyuan Shi Ian En-Hsu Yen Ian En-Hsu Yen Avner May Tianqi Chen Beidi Chen LRM 31 21 0 20 Aug 2024