Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.09179
Cited By
On the Representation Collapse of Sparse Mixture of Experts
20 April 2022
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
Barun Patra
Saksham Singhal
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMe
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Representation Collapse of Sparse Mixture of Experts"
20 / 20 papers shown
Title
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
115
1
0
10 Mar 2025
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
81
2
0
26 Feb 2025
Tight Clusters Make Specialized Experts
Stefan K. Nielsen
R. Teo
Laziz U. Abdullaev
Tan M. Nguyen
MoE
56
2
0
21 Feb 2025
Theory on Mixture-of-Experts in Continual Learning
Hongbo Li
Sen-Fon Lin
Lingjie Duan
Yingbin Liang
Ness B. Shroff
MoE
MoMe
CLL
151
14
0
20 Feb 2025
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
127
0
0
07 Feb 2025
Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation
Mingrui Liu
Sixiao Zhang
Cheng Long
31
0
0
03 Nov 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
31
3
0
18 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
73
2
0
13 Aug 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
34
5
0
27 May 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
27
25
0
22 Feb 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
S. Saria
MoE
69
16
0
05 Feb 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
51
301
0
17 Jul 2023
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
Yachen Yan
Liubo Li
6
3
0
06 Jan 2023
MoEC: Mixture of Expert Clusters
Yuan Xie
Shaohan Huang
Tianyu Chen
Furu Wei
MoE
35
11
0
19 Jul 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
19
95
0
13 Jun 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
149
326
0
18 Feb 2022
MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
ELM
242
490
0
16 Oct 2019
1