Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2411.15708
Cited By
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
24 November 2024
Xiaoye Qu
Daize Dong
Xuyang Hu
Tong Zhu
Weigao Sun
Yu Cheng
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Github (86★)
Papers citing
"LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training"
16 / 16 papers shown
Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?
Sukwon Yun
Heming Yao
Burkhard Hoeckendorf
David Richmond
Aviv Regev
Russell Littman
MoE
191
0
0
21 Nov 2025
MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery
Baiye Cheng
Tianhai Liang
Suning Huang
Maanping Shao
Feihong Zhang
Botian Xu
Zhengrong Xue
Huazhe Xu
295
3
0
07 Nov 2025
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
Xinfeng Xia
Jiacheng Liu
Xiaofeng Hou
Peng Tang
Mingxuan Zhang
Wenfeng Wang
Chao Li
MoE
188
0
0
22 Oct 2025
Native Hybrid Attention for Efficient Sequence Modeling
Jusen Du
Jiaxi Hu
Tao Zhang
Weigao Sun
Yu Cheng
216
5
0
08 Oct 2025
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Jingcong Liang
Siyuan Wang
Miren Tian
Yitong Li
Duyu Tang
Zhongyu Wei
MoE
349
1
0
21 May 2025
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
305
1
0
12 May 2025
SEE: Continual Fine-tuning with Sequential Ensemble of Experts
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhilin Wang
Yafu Li
Xiaoye Qu
Yu Cheng
CLL
KELM
319
2
0
09 Apr 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
728
117
0
27 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
1.2K
64
0
10 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
556
7
0
07 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
462
13
0
03 Mar 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Xiaoye Qu
Wei Wei
Yu Cheng
MoE
1.1K
12
0
24 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
609
21
0
19 Feb 2025
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
680
18
0
28 Sep 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
304
19
0
17 Jun 2024
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Lei Li
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
706
13
0
01 May 2024
1
Page 1 of 1