Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14297
Cited By
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
23 May 2024
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models"
14 / 14 papers shown
Title
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks
Wenjing Xiao
Wenhao Song
Miaojiang Chen
Ruikun Luo
Min Chen
MoE
32
0
0
29 Apr 2025
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
31
2
0
21 Feb 2025
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
37
2
0
28 Jun 2024
TroL: Traversal of Layers for Large Language and Vision Models
Byung-Kwan Lee
Sangyun Chung
Chae Won Kim
Beomchan Park
Yong Man Ro
19
6
0
18 Jun 2024
Towards an empirical understanding of MoE design choices
Dongyang Fan
Bettina Messmer
Martin Jaggi
24
5
0
20 Feb 2024
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
112
59
0
02 Aug 2023
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
50
59
0
20 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
89
60
0
07 Jun 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
137
203
0
18 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
378
4,010
0
28 Jan 2022
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,003
0
20 Apr 2018
1