Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.09832
Cited By
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
15 October 2023
Shwai He
Run-Ze Fan
Liang Ding
Li Shen
Tianyi Zhou
Dacheng Tao
MoE
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Merging Experts into One: Improving Computational Efficiency of Mixture of Experts"
10 / 10 papers shown
Title
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
58
0
0
10 Mar 2025
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
68
2
0
26 Feb 2025
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Yaning Zhang
Qiufu Li
Zitong Yu
L. Shen
ViT
43
3
0
31 Dec 2024
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
34
7
0
15 Oct 2023
Omni-Dimensional Dynamic Convolution
Chao Li
Aojun Zhou
Anbang Yao
27
225
0
16 Sep 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
29
26
0
30 May 2022
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava
Aleksandr Drozd
Anna Rogers
83
101
0
04 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
225
204
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1