Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.15841
Cited By
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Conference on Machine Learning and Systems (MLSys), 2022
29 November 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (7 upvotes)
Papers citing
"MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"
43 / 93 papers shown
Title
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
257
2
0
01 Oct 2024
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr Locatelli
Sara Hooker
Ahmet Üstün
MoE
196
7
0
28 Aug 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
226
47
0
22 Aug 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling
An Wang
Xingwu Sun
Ruobing Xie
Shuaipeng Li
Jiaqi Zhu
...
J. N. Han
Zhanhui Kang
Di Wang
Naoaki Okazaki
Cheng-zhong Xu
MoE
232
27
0
20 Aug 2024
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
Neural Information Processing Systems (NeurIPS), 2024
Qizhen Zhang
Nikolas Gritsch
Dwaraknath Gnaneshwar
Simon Guo
David Cairuz
...
Jakob N. Foerster
Phil Blunsom
Sebastian Ruder
Ahmet Üstün
Acyr Locatelli
MoMe
MoE
228
12
0
15 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
International Conference on Learning Representations (ICLR), 2024
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
327
7
0
13 Aug 2024
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
263
50
0
31 Jul 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
339
30
0
29 Jul 2024
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Jinning Li
Jiachen Li
Sangjae Bae
David Isele
261
7
0
12 Jul 2024
Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data
Linzhuang Sun
Hao Liang
Jingxuan Wei
Linkun Sun
Bihui Yu
Bin Cui
Wentao Zhang
166
2
0
02 Jul 2024
Mixture of Experts in a Mixture of RL settings
Timon Willi
J. Obando-Ceron
Jakob Foerster
Karolina Dziugaite
Pablo Samuel Castro
MoE
335
15
0
26 Jun 2024
A Survey on Mixture of Experts in Large Language Models
Weilin Cai
Juyong Jiang
Fan Wang
Jing Tang
Sunghun Kim
Jiayi Huang
MoE
442
70
0
26 Jun 2024
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
218
2
0
27 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
430
2
0
27 May 2024
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang
Ye Tian
Zhen Jia
Shuai Zheng
Chuan Wu
Yida Wang
MoMe
138
19
0
30 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
404
169
0
22 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
260
29
0
12 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
226
44
0
11 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
250
31
0
08 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
502
22
0
07 Apr 2024
Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G
IEEE Communications Magazine (IEEE Commun. Mag.), 2024
Nassim Sehad
Lina Bariah
W. Hamidouche
Hamed Hellaoui
Riku Jäntti
Mérouane Debbah
251
27
0
02 Apr 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMe
KELM
683
166
0
20 Mar 2024
Are LLMs Good Cryptic Crossword Solvers?
Abdelrahman Boda
Daria Kotova
Ekaterina Kochmar
195
7
0
15 Mar 2024
Scattered Mixture-of-Experts Implementation
Shawn Tan
Songlin Yang
Yikang Shen
Aaron Courville
MoE
164
12
0
13 Mar 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
421
2
0
27 Feb 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Zijun Chen
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo Zhang
Junchi Yan
Jiaming Song
MoE
271
69
0
22 Feb 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield
Markos Georgopoulos
Grigorios G. Chrysos
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Jiankang Deng
Ioannis Patras
MoE
257
14
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-
k
k
k
Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoE
MoMe
160
8
0
17 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
466
59
0
13 Feb 2024
Buffer Overflow in Mixture of Experts
Jamie Hayes
Ilia Shumailov
Itay Yona
MoE
132
9
0
08 Feb 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
International Conference on Machine Learning (ICML), 2024
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
284
154
0
29 Jan 2024
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Huanjun Kong
Songyang Zhang
Jiaying Li
Min Xiao
Jun Xu
Kai-xiang Chen
VLM
171
1
0
16 Jan 2024
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
519
1,551
0
08 Jan 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALM
ELM
310
187
0
23 Dec 2023
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
380
119
0
23 Dec 2023
Memory Augmented Language Models through Mixture of Word Experts
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Cicero Nogueira dos Santos
James Lee-Thorp
Isaac Noble
Chung-Ching Chang
David C. Uthus
MoE
205
8
0
15 Nov 2023
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Mohammad Zubair
Christoph Bauinger
231
0
0
01 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
269
5
0
27 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQ
MoE
240
38
0
25 Oct 2023
Adaptive Gating in Mixture-of-Experts based Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiamin Li
Qiang Su
Yitao Yang
Yimin Jiang
Cong Wang
Hong-Yu Xu
MoE
184
13
0
11 Oct 2023
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
Pablo Samuel Castro
Utku Evci
235
19
0
27 Apr 2023
PopSparse: Accelerated block sparse matrix multiplication on IPU
Zhiyi Li
Douglas Orr
V. Ohan
Godfrey Da Costa
Tom Murray
Adam Sanders
D. Beker
Dominic Masters
215
1
0
29 Mar 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Symposium on Operating Systems Principles (SOSP), 2023
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
195
36
0
26 Jan 2023
Previous
1
2