Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.17264
Cited By
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
25 September 2024
A. Agrawal
Haoran Qiu
Junda Chen
Íñigo Goiri
Chaojie Zhang
Rayyan Shahid
R. Ramjee
Alexey Tumanov
Esha Choukse
RALM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations"
1 / 1 papers shown
Title
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
31
21
0
20 Aug 2024
1