ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.13510
23
3

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling

24 August 2024
Kunal Jain
Anjaly Parayil
Ankur Mallick
Esha Choukse
Xiaoting Qin
Jue Zhang
Íñigo Goiri
Rujia Wang
Chetan Bansal
Victor Rühle
Anoop Kulkarni
Steve Kofsky
Saravan Rajmohan
ArXivPDFHTML
Abstract

Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster. However existing scheduling algorithms treat LLM workloads as monolithic jobs without considering the distinct characteristics of the two phases in each workload. This leads to sub-optimal scheduling and increased response latency. In this work, we propose a heuristic-guided reinforcement learning-based intelligent router for data-driven and workload-aware scheduling. Our router leverages a trainable response-length predictor, and a novel formulation for estimating the impact of mixing different workloads to schedule queries across LLM instances and achieve over 11\% lower end-to-end latency than existing approaches.

View on arXiv
Comments on this paper