ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.09832
  4. Cited By
Merging Experts into One: Improving Computational Efficiency of Mixture
  of Experts

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

15 October 2023
Shwai He
Run-Ze Fan
Liang Ding
Li Shen
Tianyi Zhou
Dacheng Tao
    MoE
    MoMe
ArXivPDFHTML

Papers citing "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts"

10 / 10 papers shown
Title
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
51
0
0
10 Mar 2025
CAMEx: Curvature-aware Merging of Experts
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
63
2
0
26 Feb 2025
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection
Yaning Zhang
Qiufu Li
Zitong Yu
L. Shen
ViT
40
3
0
31 Dec 2024
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
31
7
0
15 Oct 2023
Omni-Dimensional Dynamic Convolution
Omni-Dimensional Dynamic Convolution
Chao Li
Aojun Zhou
Anbang Yao
25
225
0
16 Sep 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language
  Understanding and Generation
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
29
26
0
30 May 2022
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics
Prajjwal Bhargava
Aleksandr Drozd
Anna Rogers
83
101
0
04 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
222
204
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1