Merging Experts into One: Improving Computational Efficiency of Mixture
of Experts

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

15 October 2023

Liang Ding

Li Shen

Papers citing "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts"

10 / 10 papers shown

Title
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference Suraiya Tairin Shohaib Mahmud Haiying Shen Anand Iyer MoE 51 0 0 10 Mar 2025
CAMEx: Curvature-aware Merging of Experts Dung V. Nguyen Minh H. Nguyen Luc Q. Nguyen R. Teo T. Nguyen Linh Duy Tran MoMe 63 2 0 26 Feb 2025
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection Yaning Zhang Qiufu Li Zitong Yu L. Shen ViT 40 3 0 31 Dec 2024
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer Boan Liu Liang Ding Li Shen Keqin Peng Yu Cao Dazhao Cheng Dacheng Tao MoE 31 7 0 15 Oct 2023
Omni-Dimensional Dynamic Convolution Chao Li Aojun Zhou Anbang Yao 25 225 0 16 Sep 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation Qihuang Zhong Liang Ding Juhua Liu Bo Du Dacheng Tao 29 26 0 30 May 2022
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics Prajjwal Bhargava Aleksandr Drozd Anna Rogers 83 101 0 04 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 119 104 0 24 Sep 2021
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models Cheolhyoung Lee Kyunghyun Cho Wanmo Kang MoE 222 204 0 25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018