ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05596
  4. Cited By
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
v1v2 (latest)

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

International Conference on Machine Learning (ICML), 2022
14 January 2022
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github

Papers citing "DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale"

50 / 249 papers shown
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
Wentao Hu
Mingkuan Zhao
Shuangyong Song
Xiaoyan Zhu
Xin Lai
Jiayin Wang
192
2
0
25 Nov 2025
Token-Controlled Re-ranking for Sequential Recommendation via LLMs
Token-Controlled Re-ranking for Sequential Recommendation via LLMs
Wenxi Dai
Wujiang Xu
Pinhuan Wang
Dimitris N. Metaxas
131
1
0
22 Nov 2025
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Quentin G. Anthony
Yury Tokpanov
Skyler Szot
Srivatsan Rajagopal
Praneeth Medepalli
...
Emad Barsoum
Zhenyu Gu
Yao Fu
Beren Millidge
Beren Millidge
MoEVLMLRM
334
0
0
21 Nov 2025
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
Kexin Chu
Dawei Xiang
Zixu Shen
Yiwei Yang
Zecheng Liu
Wei Zhang
MoEMQ
532
1
0
19 Nov 2025
GPU-Initiated Networking for NCCL
GPU-Initiated Networking for NCCL
Khaled Hamidouche
John Bachan
Pak Markthub
Peter-Jan Gootzen
Elena Agostini
Sylvain Jeaugey
Aamir Shafi
Georgios Theodorakis
Manjunath Gorentla Venkata
GNN
740
3
0
19 Nov 2025
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
In-depth Analysis on Caching and Pre-fetching in Mixture of Experts Offloading
Shuning Lin
Yifan He
Yitong Chen
MoE
143
0
0
08 Nov 2025
BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction
BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction
Xiongri Shen
Jiaqi Wang
Yi Zhong
Zhenxi Song
Leilei Zhao
...
Lingyan Liang
Shuqiang Wang
Baiying Lei
Demao Deng
Zhiguo Zhang
78
0
0
07 Nov 2025
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
Fengjuan Wang
Zhiyi Su
Xingzhu Hu
Cheng Wang
Mou Sun
MQ
161
1
0
04 Nov 2025
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Costin-Andrei Oncescu
Qingyang Wu
Wai Tong Chung
Robert Wu
Bryan Gopal
Junxiong Wang
Tri Dao
Ben Athiwaratkun
MoE
274
2
0
04 Nov 2025
Soft Task-Aware Routing of Experts for Equivariant Representation Learning
Soft Task-Aware Routing of Experts for Equivariant Representation Learning
Jaebyeong Jeon
Hyeonseo Jang
Jy-yong Sohn
Kibok Lee
153
0
0
31 Oct 2025
Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications
Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications
Guangxin Su
Hanchen Wang
Jianwei Wang
Wenjie Zhang
Ying Zhang
Jian Pei
346
2
0
24 Oct 2025
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission
Weihao Yang
Hao Huang
Donglei Wu
Ningke Li
Yanqi Pan
Qiyang Zheng
Wen Xia
Shiyi Li
Qiang Wang
MoE
206
1
0
22 Oct 2025
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
Dayan Pan
Zhaoyang Fu
Jingyuan Wang
Xiao Han
Yue Zhu
Xiangyu Zhao
KELMCLL
159
1
0
20 Oct 2025
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts
Zheyue Tan
Ruoyao Xiao
Tao Yuan
Dong Zhou
Weilin Liu
...
Haiyang Xu
Boxun Li
Guohao Dai
Bo Zhao
Yu Wang
MoE
244
1
0
20 Oct 2025
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
Ruijie Miao
Yilun Yao
Zihan Wang
Z. Wang
Bairen Yi
LingJun Liu
Yikai Zhao
Tong Yang
MoMe
206
4
0
16 Oct 2025
From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill
From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill
Gunjun Lee
Jiwon Kim
Jaiyoung Park
Y. Lee
Jung Ho Ahn
MoE
164
1
0
09 Oct 2025
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
Sangmin Bae
Bilge Acun
Haroun Habeeb
S. Kim
Chien-Yu Lin
Liang Luo
Junjie Wang
Carole-Jean Wu
Mamba
226
6
0
06 Oct 2025
DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks
Nghiem Tuong Diep
Hien Dang
Tuan Truong
Tan Dinh
Huy Le Nguyen
Nhat Ho
245
1
0
05 Oct 2025
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Minghao Yang
Ren Togo
Guang Li
Takahiro Ogawa
Miki Haseyama
MoEMoMe
205
2
0
01 Oct 2025
Collaborative Compression for Large-Scale MoE Deployment on Edge
Collaborative Compression for Large-Scale MoE Deployment on Edge
Yixiao Chen
Yanyue Xie
Ruining Yang
Wei Jiang
Wei Wang
Yong He
Yue Chen
Pu Zhao
Y. Wang
MQ
117
0
0
30 Sep 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
146
1
0
30 Sep 2025
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
Chuanyang Zheng
Jiankai Sun
Yihang Gao
Enze Xie
Yuehao Wang
...
Kashif Rasul
Mac Schwager
Anderson Schneider
Zinan Lin
Yuriy Nevmyvaka
MoE
307
2
0
30 Sep 2025
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
Rana Shahout
Colin Cai
Yilun Du
Minlan Yu
Michael Mitzenmacher
MoEMoMe
220
3
0
29 Sep 2025
LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
Enda Yu
Zhaoning Zhang
Dezun Dong
Yongwei Wu
Xiangke Liao
Haojie Wang
Dongsheng Li
Yongwei Wu
Xiangke Liao
MoE
199
4
0
28 Sep 2025
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
Jihu Guo
Tenghui Ma
Wei Gao
Peng Sun
Jiaxing Li
Xun Chen
Yuyang Jin
Dahua Lin
155
1
0
28 Sep 2025
Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression
Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression
Peijun Zhu
Ning Yang
Jiayu Wei
Jinghang Wu
Haijun Zhang
Haijun Zhang
Pin Lv
MoE
244
2
0
27 Sep 2025
Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute
Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute
Felipe Oviedo
Fiodar Kazhamiaka
Esha Choukse
Allen Kim
Amy Luers
Melanie Nakagawa
Ricardo Bianchini
J. L. Ferres
200
5
0
24 Sep 2025
Towards Anytime Retrieval: A Benchmark for Anytime Person Re-Identification
Towards Anytime Retrieval: A Benchmark for Anytime Person Re-IdentificationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Xulin Li
Yan Lu
B. Liu
J. Li
Qinhong Yang
Tao Gong
Qi Chu
Mang Ye
Nenghai Yu
208
2
0
20 Sep 2025
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models
Heng Zhang
Haichuan Hu
Yaomin Shen
Weihao Yu
Yilei Yuan
...
Zijian Zhang
Lubin Gan
Huihui Wei
Hao Zhang
Jin Huang
MoE
454
1
0
16 Sep 2025
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective
Seokjin Go
Joongun Park
Spandan More
Hanjiang Wu
Irene Wang
Aaron Jezghani
Tushar Krishna
Divya Mahajan
310
5
0
12 Sep 2025
Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers
Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts Layers
Svetlana Pavlitska
Haixi Fan
Konstantin Ditschuneit
Johann Marius Zöllner
AAMLMoE
144
1
0
05 Sep 2025
Extracting Uncertainty Estimates from Mixtures of Experts for Semantic Segmentation
Extracting Uncertainty Estimates from Mixtures of Experts for Semantic Segmentation
Svetlana Pavlitska
Beyza Keskin
Alwin Faßbender
Christian Hubschneider
Johann Marius Zöllner
UQCVMoE
258
4
0
05 Sep 2025
LongCat-Flash Technical Report
LongCat-Flash Technical Report
M-A-P Team
Bayan
Bei Li
Bingye Lei
Bo Wang
...
Rongxiang Weng
Ruichen Shao
Rumei Li
Shizhe Wu
Shuai Liang
MLLMMoEVLM
532
32
0
01 Sep 2025
Survey of Specialized Large Language Model
Survey of Specialized Large Language Model
Chenghan Yang
Ruiyu Zhao
Yang Liu
Ling Jiang
LM&MA
240
4
0
27 Aug 2025
Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
Rongzhi Li
Ruogu Du
Zefang Chu
Sida Zhao
Chunlei Han
...
Yiwen Shao
Huanle Han
Long Huang
Zherui Liu
Shufan Liu
139
5
0
27 Aug 2025
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Zihao Huang
Yu Bao
Qiyang Min
S. Chen
Ran Guo
...
Defa Zhu
Yutao Zeng
Banggu Wu
Xun Zhou
Siyuan Qiao
MoE
242
5
0
26 Aug 2025
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
DualSparse-MoE: Coordinating Tensor/Neuron-Level Sparsity with Expert Partition and Reconstruction
Weilin Cai
Le Qin
Shwai He
Junwei Cui
Ang Li
Jiayi Huang
MoE
185
0
0
25 Aug 2025
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
Krishna Teja Chitty-Venkata
Sylvia Howland
Golara Azar
Daria Soboleva
Natalia Vassilieva
Siddhisanket Raskar
M. Emani
V. Vishwanath
MoE
147
4
0
24 Aug 2025
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model
Deepak Kumar
Divakar Yadav
Yash Patel
MoE
321
3
0
22 Aug 2025
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
Yueming Yuan
Ahan Gupta
Jianping Li
Sajal Dash
Feiyi Wang
Minjia Zhang
MoE
129
1
0
18 Aug 2025
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
Dianyi Wang
Siyuan Wang
Zejun Li
Yikun Wang
Yitong Li
Duyu Tang
Xiaoyu Shen
Xuanjing Huang
Zhongyu Wei
MoE
256
2
0
13 Aug 2025
HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
Wenxiang Lin
Xinglin Pan
Lin Zhang
Shaohuai Shi
Xuan Wang
Xiaowen Chu
MoE
180
1
0
13 Aug 2025
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging
Xin He
Junxi Shen
Zhenheng Tang
Xiaowen Chu
Bo Li
Ivor Tsang
Yew-Soon Ong
MoMeMoE
236
2
0
03 Aug 2025
Load Balancing for AI Training Workloads
Load Balancing for AI Training Workloads
Sarah McClure
Sylvia Ratnasamy
S. Shenker
Mark Silberstein
Sylvia Ratnasamy
Scott Shenker
Isaac Keslassy
234
3
0
28 Jul 2025
MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
MindSpeed RL: Distributed Dataflow for Scalable and Efficient RL Training on Ascend NPU Cluster
Laingjun Feng
Chenyi Pan
Xinjie Guo
Fei Mei
Benzhe Ning
...
Chang Liu
Guang Yang
Zhenyu Han
Jiangben Wang
Bo Wang
MoEOffRL
230
8
0
25 Jul 2025
Rethinking LLM Inference Bottlenecks: Insights from Latent Attention and Mixture-of-Experts
Rethinking LLM Inference Bottlenecks: Insights from Latent Attention and Mixture-of-Experts
Sungmin Yun
Seonyong Park
Hwayong Nam
Younjoo Lee
Gunjun Lee
...
Jongmin Kim
Hyungyo Kim
Juhwan Cho
Seungmin Baek
Jung Ho Ahn
MoE
322
5
0
21 Jul 2025
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Sangmin Bae
Yujin Kim
Reza Bayat
S. Kim
Jiyoun Ha
...
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Aaron Courville
Se-Young Yun
MoE
362
44
0
14 Jul 2025
Symbiosis: Multi-Adapter Inference and Fine-Tuning
Symbiosis: Multi-Adapter Inference and Fine-Tuning
Saransh Gupta
Umesh Deshpande
Travis Janssen
Swami Sundararaman
MoE
422
0
0
03 Jul 2025
TrainVerify: Equivalence-Based Verification for Distributed LLM Training
TrainVerify: Equivalence-Based Verification for Distributed LLM TrainingSymposium on Operating Systems Principles (SOSP), 2025
Yunchi Lu
Youshan Miao
Cheng Tan
Peng Huang
Yi Zhu
Xian Zhang
Fan Yang
LRM
225
5
0
19 Jun 2025
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
Weixun Wang
Shaopan Xiong
Gengru Chen
Wei Gao
Sheng Guo
...
Lin Qu
Yuchi Xu
Wei Wang
Jiamang Wang
Bo Zheng
OffRL
395
61
0
06 Jun 2025
12345
Next
Page 1 of 5