ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.12656
  4. Cited By
HyperMoE: Towards Better Mixture of Experts via Transferring Among
  Experts

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

20 February 2024
Hao Zhao
Zihan Qiu
Huijia Wu
Zili Wang
Zhaofeng He
Jie Fu
    MoE
ArXivPDFHTML

Papers citing "HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts"

12 / 12 papers shown
Title
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
88
0
0
10 Mar 2025
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su
Xing Wu
Zijia Lin
Yizhe Xiong
Minxuan Lv
Guangyuan Ma
Hui Chen
Songlin Hu
Guiguang Ding
MoE
26
2
0
21 Oct 2024
Mixture of Diverse Size Experts
Mixture of Diverse Size Experts
Manxi Sun
Wei Liu
Jian Luan
Pengzhi Gao
Bin Wang
MoE
21
1
0
18 Sep 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
68
2
0
13 Aug 2024
Read to Play (R2-Play): Decision Transformer with Multimodal Game
  Instruction
Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction
Yonggang Jin
Ge Zhang
Hao Zhao
Tianyu Zheng
Jiawei Guo
Liuyu Xiang
Shawn Yue
Stephen W. Huang
Zhaofeng He
Jie Fu
OffRL
27
4
0
06 Feb 2024
Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning
Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning
Hao Zhao
Jie Fu
Zhaofeng He
105
6
0
18 Oct 2023
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
A. Ustun
Arianna Bisazza
G. Bouma
Gertjan van Noord
Sebastian Ruder
44
32
0
24 May 2022
Multilingual Machine Translation with Hyper-Adapters
Multilingual Machine Translation with Hyper-Adapters
Christos Baziotis
Mikel Artetxe
James Cross
Shruti Bhosale
63
21
0
22 May 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
326
0
18 Feb 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,471
0
17 Apr 2017
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
170
3,508
0
10 Jun 2015
1