Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2211.15841
Cited By
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Conference on Machine Learning and Systems (MLSys), 2022
29 November 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (7 upvotes)
Papers citing
"MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"
50 / 92 papers shown
Title
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
Kexin Chu
Dawei Xiang
Zixu Shen
Yiwei Yang
Zecheng Liu
Wei Zhang
MoE
MQ
363
1
0
19 Nov 2025
MC#: Mixture Compressor for Mixture-of-Experts Large Models
Wei Huang
Yue Liao
Yukang Chen
Jianhui Liu
Haoru Tan
Si Liu
Shiming Zhang
Shuicheng Yan
Xiaojuan Qi
MoE
MQ
192
0
0
13 Oct 2025
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024
Tuowei Wang
Kun Li
Zixu Hao
Donglin Bai
Ju Ren
Yaoxue Zhang
Ting Cao
M. Yang
136
4
0
12 Oct 2025
AB-PINNs: Adaptive-Basis Physics-Informed Neural Networks for Residual-Driven Domain Decomposition
Jonah Botvinick-Greenhouse
Wael H. Ali
M. Benosman
S. Mowlavi
AI4CE
119
0
0
10 Oct 2025
A PCA-based Data Prediction Method
Baltic Journal of Modern Computing (BJMC), 2025
Peteris Daugulis
Vija Vagale
Emiliano Mancini
Filippo Castiglione
132
3
0
10 Oct 2025
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Cheng Li
Jiexiong Liu
Yixuan Chen
Jie ji
MoE
70
0
0
05 Sep 2025
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Taishi Nakamura
Satoki Ishikawa
Masaki Kawamura
Takumi Okamoto
Daisuke Nohara
Jun Suzuki
Rio Yokota
MoE
LRM
131
0
0
26 Aug 2025
X-MoE: Enabling Scalable Training for Emerging Mixture-of-Experts Architectures on HPC Platforms
Yueming Yuan
Ahan Gupta
Jianping Li
Sajal Dash
Feiyi Wang
Minjia Zhang
MoE
77
0
0
18 Aug 2025
Maximum Score Routing For Mixture-of-Experts
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Bowen Dong
Yilong Fan
Yutao Sun
Zhenyu Li
Tengyu Pan
Xun Zhou
Jianyong Wang
MoE
97
2
0
18 Aug 2025
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuanteng Chen
Yuantian Shao
Peisong Wang
Jian Cheng
MoE
145
2
0
03 Aug 2025
Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling
Ning Liao
Xiaoxing Wang
Zehao Lin
Weiyang Guo
Feng Hong
...
Junchi Yan
Zhiyu Li
Feiyu Xiong
Yanfeng Wang
Linfeng Zhang
CLL
195
1
0
24 Jul 2025
Apple Intelligence Foundation Language Models: Tech Report 2025
Ethan Li
Anders Boesen Lindbo Larsen
Chen Zhang
Xiyou Zhou
Jun Qin
...
Josh Elman
Dong Yin
Yusuf Goren
J. Lai
Yiran Fei
150
5
0
17 Jul 2025
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
Chenyang Song
Weilin Zhao
Xu Han
Chaojun Xiao
Yingfa Chen
Yuxuan Li
Zhiyuan Liu
Maosong Sun
MoE
236
0
0
11 Jul 2025
Load Balancing Mixture of Experts with Similarity Preserving Routers
Nabil Omi
S. Sen
Ali Farhadi
MoE
246
6
0
16 Jun 2025
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights
Jakub Krajewski
Marcin Chochowski
Daniel Korzekwa
MoE
ALM
185
0
0
03 Jun 2025
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
Shuqing Luo
Pingzhi Li
Jie Peng
Hanrui Wang
Yang
Zhao
Yu Cheng
Tianlong Chen
MoE
291
2
0
19 May 2025
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Yehui Tang
Yichun Yin
Yaoyuan Wang
Hang Zhou
Yu Pan
...
Zhe Liu
Zhicheng Liu
Zhuowen Tu
Zilin Ding
Zongyuan Zhan
MoE
255
7
0
07 May 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
421
1
0
23 Apr 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
358
5
0
21 Apr 2025
SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training
Zheng Li
Wenshu Fan
Wei Zhang
Tailing Yuan
Bin Chen
Chengru Song
Chen Zhang
183
1
0
20 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
Supriyo Chakraborty
MoE
364
2
0
16 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
International Symposium on Computer Architecture (ISCA), 2025
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
Jiajian Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
274
8
0
11 Apr 2025
HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs
Yongji Wu
Xueshen Liu
Shuowei Jin
Ceyu Xu
Feng Qian
Ron Yifeng Wang
Matthew Lentz
Danyang Zhuo
Ion Stoica
MoMe
MoE
226
4
0
04 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
425
2
0
02 Apr 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
European Conference on Computer Systems (EuroSys), 2025
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
154
5
0
13 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
307
5
0
13 Mar 2025
Accelerating MoE Model Inference with Expert Sharding
Oana Balmau
Anne-Marie Kermarrec
Rafael Pires
André Loureiro Espírito Santo
M. Vos
Milos Vujasinovic
MoE
236
3
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
819
4
0
10 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
481
6
0
07 Mar 2025
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Siyang Song
Eugene Belilovsky
Irina Rish
MoE
323
2
0
06 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Computer Vision and Pattern Recognition (CVPR), 2025
Y. Huang
Peng Ye
Chenyu Huang
Jianjian Cao
Lin Zhang
Baopu Li
Gang Yu
Tao Chen
MoMe
MoE
226
6
0
03 Mar 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
International Conference on Learning Representations (ICLR), 2025
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
302
8
0
26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
264
0
0
24 Feb 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
245
0
0
18 Feb 2025
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zihan Qiu
Zeyu Huang
Jian Xu
Kaiyue Wen
Zhaoxiang Wang
Rui Men
Ivan Titov
Dayiheng Liu
Jingren Zhou
Junyang Lin
MoE
335
15
0
21 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
482
24
0
21 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
983
0
0
08 Jan 2025
What Makes Cryptic Crosswords Challenging for LLMs?
International Conference on Computational Linguistics (COLING), 2024
Abdelrahman Sadallah
Daria Kotova
Ekaterina Kochmar
AAML
290
0
0
12 Dec 2024
Sparse Upcycling: Inference Inefficient Finetuning
Sasha Doubov
Nikhil Sardana
Vitaliy Chiley
MoE
148
1
0
13 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
175
0
0
02 Nov 2024
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
Jiafeng Guo
Yan Liu
Yu Meng
Zhiwei Tao
Banglan Liu
Gang Chen
Xiang Li
MoE
133
0
0
01 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
576
4
0
01 Nov 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
International Conference on Learning Representations (ICLR), 2024
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
356
20
0
28 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
International Conference on Learning Representations (ICLR), 2024
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
322
14
0
24 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
396
3
0
15 Oct 2024
Upcycling Large Language Models into Mixture of Experts
Ethan He
Syeda Nahida Akter
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
Mohammad Shoeybi
Bryan Catanzaro
MoE
373
32
0
10 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
International Conference on Learning Representations (ICLR), 2024
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
252
21
0
08 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
International Conference on Machine Learning (ICML), 2024
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
207
6
0
04 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
International Conference on Learning Representations (ICLR), 2024
Ghada Sokar
J. Obando-Ceron
Rameswar Panda
Hugo Larochelle
Pablo Samuel Castro
MoE
599
7
0
02 Oct 2024
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
245
2
0
01 Oct 2024
1
2
Next