v1v2 (latest)

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

International Conference on Machine Learning (ICML), 2022

14 January 2022

Reza Yazdani Aminabadi

A. A. Awan

Jeff Rasley

Yuxiong He

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github

Papers citing "DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale"

50 / 249 papers shown

Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models

192

25 Nov 2025

Token-Controlled Re-ranking for Sequential Recommendation via LLMs

131

22 Nov 2025

Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design

...

334

21 Nov 2025

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference

532

19 Nov 2025

GPU-Initiated Networking for NCCL

Manjunath Gorentla Venkata

GNN

740

19 Nov 2025

In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading

143

08 Nov 2025

BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction

...

07 Nov 2025

FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error

161

04 Nov 2025

Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining

Costin-Andrei Oncescu

274

04 Nov 2025

Soft Task-Aware Routing of Experts for Equivariant Representation Learning

153

31 Oct 2025

Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications

346

24 Oct 2025

HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission

206

22 Oct 2025

Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models

159

20 Oct 2025

ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts

...

244

20 Oct 2025

MergeMoE: Efficient Compression of MoE Models via Expert Output Merging

206

16 Oct 2025

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

164

09 Oct 2025

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

226

06 Oct 2025

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks

245

05 Oct 2025

Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning

205

01 Oct 2025

Collaborative Compression for Large-Scale MoE Deployment on Edge

117

30 Sep 2025

Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization

146

30 Sep 2025

Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel

...

307

30 Sep 2025

From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing

220

29 Sep 2025

LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers

Haojie Wang

Dongsheng Li

Yongwei Wu

Xiangke Liao

MoE

199

28 Sep 2025

AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models

155

28 Sep 2025

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression

Pin Lv

244

27 Sep 2025

Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute

200

24 Sep 2025

Towards Anytime Retrieval: A Benchmark for Anytime Person Re-IdentificationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

208

20 Sep 2025

AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models

...

454

16 Sep 2025

Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective

310

12 Sep 2025

Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers

Svetlana Pavlitska

Haixi Fan

Konstantin Ditschuneit

Johann Marius Zöllner

AAML MoE

144

05 Sep 2025

Extracting Uncertainty Estimates from Mixtures of Experts for Semantic Segmentation

Svetlana Pavlitska

Beyza Keskin

Alwin Faßbender

Christian Hubschneider

Johann Marius Zöllner

UQCV MoE

258

05 Sep 2025

LongCat-Flash Technical Report

...

532

01 Sep 2025

Survey of Specialized Large Language Model

240

27 Aug 2025

Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference

...

139

27 Aug 2025

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

...

242

26 Aug 2025

DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction

185

25 Aug 2025

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

Krishna Teja Chitty-Venkata

147

24 Aug 2025

GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model

321

22 Aug 2025

X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms

129

18 Aug 2025

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models

256

13 Aug 2025

HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap

180

13 Aug 2025

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging

236

03 Aug 2025

Load Balancing for AI Training Workloads

Sarah McClure

Sylvia Ratnasamy

S. Shenker

Mark Silberstein

Sylvia Ratnasamy

Scott Shenker

Isaac Keslassy

234

28 Jul 2025

MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster

...

230

25 Jul 2025

Rethinking LLM Inference Bottlenecks: Insights from Latent Attention and Mixture-of-Experts

...

322

21 Jul 2025

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

...

362

14 Jul 2025

Symbiosis: Multi-Adapter Inference and Fine-Tuning

422

03 Jul 2025

TrainVerify: Equivalence-Based Verification for Distributed LLM TrainingSymposium on Operating Systems Principles (SOSP), 2025

225

19 Jun 2025

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

...

395

06 Jun 2025