ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.15841
  4. Cited By
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Conference on Machine Learning and Systems (MLSys), 2022
29 November 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
    MoE
ArXiv (abs)PDFHTMLHuggingFace (7 upvotes)

Papers citing "MegaBlocks: Efficient Sparse Training with Mixture-of-Experts"

43 / 93 papers shown
Title
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language
  Models
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models
Keivan Alizadeh
Iman Mirzadeh
Hooman Shahrokhi
Dmitry Belenko
Frank Sun
Minsik Cho
Mohammad Hossein Sekhavat
Moin Nabi
Mehrdad Farajtabar
MoE
257
2
0
01 Oct 2024
Nexus: Specialization meets Adaptability for Efficiently Training
  Mixture of Experts
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr Locatelli
Sara Hooker
Ahmet Üstün
MoE
196
7
0
28 Aug 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
226
47
0
22 Aug 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling
HMoE: Heterogeneous Mixture of Experts for Language Modeling
An Wang
Xingwu Sun
Ruobing Xie
Shuaipeng Li
Jiaqi Zhu
...
J. N. Han
Zhanhui Kang
Di Wang
Naoaki Okazaki
Cheng-zhong Xu
MoE
232
27
0
20 Aug 2024
BAM! Just Like That: Simple and Efficient Parameter Upcycling for
  Mixture of Experts
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2024
Qizhen Zhang
Nikolas Gritsch
Dwaraknath Gnaneshwar
Simon Guo
David Cairuz
...
Jakob N. Foerster
Phil Blunsom
Sebastian Ruder
Ahmet Üstün
Acyr Locatelli
MoMeMoE
228
12
0
15 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-ExpertsInternational Conference on Learning Representations (ICLR), 2024
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
327
7
0
13 Aug 2024
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
  Experts
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
263
50
0
31 Jul 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
339
30
0
29 Jul 2024
Adaptive Prediction Ensemble: Improving Out-of-Distribution
  Generalization of Motion Forecasting
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Jinning Li
Jiachen Li
Sangjae Bae
David Isele
261
7
0
12 Jul 2024
Efficient-Empathy: Towards Efficient and Effective Selection of Empathy
  Data
Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data
Linzhuang Sun
Hao Liang
Jingxuan Wei
Linkun Sun
Bihui Yu
Bin Cui
Wentao Zhang
166
2
0
02 Jul 2024
Mixture of Experts in a Mixture of RL settings
Mixture of Experts in a Mixture of RL settings
Timon Willi
J. Obando-Ceron
Jakob Foerster
Karolina Dziugaite
Pablo Samuel Castro
MoE
335
15
0
26 Jun 2024
A Survey on Mixture of Experts in Large Language Models
A Survey on Mixture of Experts in Large Language Models
Weilin Cai
Juyong Jiang
Fan Wang
Jing Tang
Sunghun Kim
Jiayi Huang
MoE
442
70
0
26 Jun 2024
Scorch: A Library for Sparse Deep Learning
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
218
2
0
27 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffMSLR
430
2
0
27 May 2024
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph
  Computation-Communication Overlapping
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang
Ye Tian
Zhen Jia
Shuai Zheng
Chuan Wu
Yida Wang
MoMe
138
19
0
30 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
404
169
0
22 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
260
29
0
12 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoEALM
226
44
0
11 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of
  Mixture-of-Experts Language Models
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
250
31
0
08 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
502
22
0
07 Apr 2024
Generative AI for Immersive Communication: The Next Frontier in
  Internet-of-Senses Through 6G
Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6GIEEE Communications Magazine (IEEE Commun. Mag.), 2024
Nassim Sehad
Lina Bariah
W. Hamidouche
Hamed Hellaoui
Riku Jäntti
Mérouane Debbah
251
27
0
02 Apr 2024
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMeKELM
683
166
0
20 Mar 2024
Are LLMs Good Cryptic Crossword Solvers?
Are LLMs Good Cryptic Crossword Solvers?
Abdelrahman Boda
Daria Kotova
Ekaterina Kochmar
195
7
0
15 Mar 2024
Scattered Mixture-of-Experts Implementation
Scattered Mixture-of-Experts Implementation
Shawn Tan
Songlin Yang
Yikang Shen
Aaron Courville
MoE
164
12
0
13 Mar 2024
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
T. Yasuda
Kyriakos Axiotis
Gang Fu
M. Bateni
Vahab Mirrokni
421
2
0
27 Feb 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Zijun Chen
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo Zhang
Junchi Yan
Jiaming Song
MoE
271
69
0
22 Feb 2024
Multilinear Mixture of Experts: Scalable Expert Specialization through
  Factorization
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
James Oldfield
Markos Georgopoulos
Grigorios G. Chrysos
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Jiankang Deng
Ioannis Patras
MoE
257
14
0
19 Feb 2024
Turn Waste into Worth: Rectifying Top-$k$ Router of MoE
Turn Waste into Worth: Rectifying Top-kkk Router of MoE
Zhiyuan Zeng
Qipeng Guo
Zhaoye Fei
Zhangyue Yin
Yunhua Zhou
Linyang Li
Tianxiang Sun
Hang Yan
Dahua Lin
Xipeng Qiu
MoEMoMe
160
8
0
17 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
466
59
0
13 Feb 2024
Buffer Overflow in Mixture of Experts
Buffer Overflow in Mixture of Experts
Jamie Hayes
Ilia Shumailov
Itay Yona
MoE
132
9
0
08 Feb 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsInternational Conference on Machine Learning (ICML), 2024
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
284
154
0
29 Jan 2024
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical
  Assistance
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Huanjun Kong
Songyang Zhang
Jiaying Li
Min Xiao
Jun Xu
Kai-xiang Chen
VLM
171
1
0
16 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
519
1,551
0
08 Jan 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALMELM
310
187
0
23 Dec 2023
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Xupeng Miao
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
380
119
0
23 Dec 2023
Memory Augmented Language Models through Mixture of Word Experts
Memory Augmented Language Models through Mixture of Word ExpertsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Cicero Nogueira dos Santos
James Lee-Thorp
Isaac Noble
Chung-Ching Chang
David C. Uthus
MoE
205
8
0
15 Nov 2023
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel
  Max Series GPU
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Mohammad Zubair
Christoph Bauinger
231
0
0
01 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
MOSEL: Inference Serving Using Dynamic Modality SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
269
5
0
27 Oct 2023
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Elias Frantar
Dan Alistarh
MQMoE
240
38
0
25 Oct 2023
Adaptive Gating in Mixture-of-Experts based Language Models
Adaptive Gating in Mixture-of-Experts based Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiamin Li
Qiang Su
Yitao Yang
Yimin Jiang
Cong Wang
Hong-Yu Xu
MoE
184
13
0
11 Oct 2023
JaxPruner: A concise library for sparsity research
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
Pablo Samuel Castro
Utku Evci
235
19
0
27 Apr 2023
PopSparse: Accelerated block sparse matrix multiplication on IPU
PopSparse: Accelerated block sparse matrix multiplication on IPU
Zhiyi Li
Douglas Orr
V. Ohan
Godfrey Da Costa
Tom Murray
Adam Sanders
D. Beker
Dominic Masters
215
1
0
29 Mar 2023
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation
  Invariant Transformation
PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant TransformationSymposium on Operating Systems Principles (SOSP), 2023
Ningxin Zheng
Huiqiang Jiang
Quan Zhang
Zhenhua Han
Yuqing Yang
...
Fan Yang
Chengruidong Zhang
Lili Qiu
Mao Yang
Lidong Zhou
195
36
0
26 Jan 2023
Previous
12