Orders in Chaos: Enhancing Large-Scale MoE LLM Serving with Data Movement Forecasting
- MoE
Main:11 Pages
15 Figures
Bibliography:3 Pages
2 Tables
Abstract
Large-scale Mixture of Experts (MoE) Large Language Models (LLMs) have recently become the frontier open weight models, achieving remarkable model capability similar to proprietary ones. But their random expert selection mechanism introduces significant data movement overhead that becomes the dominant bottleneck in multi-unit LLM serving systems.
View on arXivComments on this paper
