339

Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting

Main:11 Pages
15 Figures
Bibliography:3 Pages
2 Tables
Abstract

Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems.

View on arXiv
Comments on this paper