Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.10890
Cited By
One Student Knows All Experts Know: From Sparse to Dense
26 January 2022
Fuzhao Xue
Xiaoxin He
Xiaozhe Ren
Yuxuan Lou
Yang You
MoMe
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"One Student Knows All Experts Know: From Sparse to Dense"
16 / 16 papers shown
Title
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
38
0
0
02 Apr 2025
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration
Mengting Ai
Tianxin Wei
Yifan Chen
Zhichen Zeng
Ritchie Zhao
G. Varatkar
B. Rouhani
Xianfeng Tang
Hanghang Tong
Jingrui He
MoE
47
1
0
10 Mar 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
54
0
0
18 Feb 2025
Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
Zhenyu Kuang
Hongyang Zhang
Lidong Cheng
Yinhao Liu
Yue Huang
Xinghao Ding
Xinghao Ding
Huafeng Li
31
0
0
10 Jul 2024
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
Yifeng Ding
Jiawei Liu
Yuxiang Wei
Terry Yue Zhuo
Lingming Zhang
ALM
MoE
42
3
0
23 Apr 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue
Zian Zheng
Yao Fu
Jinjie Ni
Zangwei Zheng
Wangchunshu Zhou
Yang You
MoE
20
87
0
29 Jan 2024
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation
Rongyu Zhang
Yulin Luo
Jiaming Liu
Huanrui Yang
Zhen Dong
...
Tomoyuki Okuno
Yohei Nakata
Kurt Keutzer
Yuan Du
Shanghang Zhang
MoMe
MoE
32
3
0
27 Dec 2023
Adaptive Gating in Mixture-of-Experts based Language Models
Jiamin Li
Qiang Su
Yitao Yang
Yimin Jiang
Cong Wang
Hong-Yu Xu
MoE
27
5
0
11 Oct 2023
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Yongqian Huang
Peng Ye
Xiaoshui Huang
Sheng R. Li
Tao Chen
Tong He
Wanli Ouyang
MoMe
21
8
0
11 Aug 2023
One-stop Training of Multiple Capacity Models
Lan Jiang
Haoyang Huang
Dongdong Zhang
R. Jiang
Furu Wei
28
0
0
23 May 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
81
76
0
22 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
27
19
0
20 May 2023
MoEC: Mixture of Expert Clusters
Yuan Xie
Shaohan Huang
Tianyu Chen
Furu Wei
MoE
40
11
0
19 Jul 2022
Task-Specific Expert Pruning for Sparse Mixture-of-Experts
Tianyu Chen
Shaohan Huang
Yuan Xie
Binxing Jiao
Daxin Jiang
Haoyi Zhou
Jianxin Li
Furu Wei
MoE
32
39
0
01 Jun 2022
A Study on Transformer Configuration and Training Objective
Fuzhao Xue
Jianghai Chen
Aixin Sun
Xiaozhe Ren
Zangwei Zheng
Xiaoxin He
Yongming Chen
Xin Jiang
Yang You
30
7
0
21 May 2022
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
1