ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.17264
  4. Cited By
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

25 September 2024
A. Agrawal
Haoran Qiu
Junda Chen
Íñigo Goiri
Chaojie Zhang
Rayyan Shahid
R. Ramjee
Alexey Tumanov
Esha Choukse
    RALM
    LRM
ArXivPDFHTML

Papers citing "Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations"

1 / 1 papers shown
Title
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
31
21
0
20 Aug 2024
1