ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.15708
  4. Cited By
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of
  Mixture-of-Experts with Post-Training

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

24 November 2024
Xiaoye Qu
Daize Dong
Xuyang Hu
Tong Zhu
Weigao Sun
Yu Cheng
    MoE
ArXiv (abs)PDFHTMLGithub (86★)

Papers citing "LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training"

16 / 16 papers shown
Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?
Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required?
Sukwon Yun
Heming Yao
Burkhard Hoeckendorf
David Richmond
Aviv Regev
Russell Littman
MoE
191
0
0
21 Nov 2025
MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery
MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery
Baiye Cheng
Tianhai Liang
Suning Huang
Maanping Shao
Feihong Zhang
Botian Xu
Zhengrong Xue
Huazhe Xu
295
3
0
07 Nov 2025
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
Xinfeng Xia
Jiacheng Liu
Xiaofeng Hou
Peng Tang
Mingxuan Zhang
Wenfeng Wang
Chao Li
MoE
188
0
0
22 Oct 2025
Native Hybrid Attention for Efficient Sequence Modeling
Native Hybrid Attention for Efficient Sequence Modeling
Jusen Du
Jiaxi Hu
Tao Zhang
Weigao Sun
Yu Cheng
216
5
0
08 Oct 2025
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Jingcong Liang
Siyuan Wang
Miren Tian
Yitong Li
Duyu Tang
Zhongyu Wei
MoE
349
1
0
21 May 2025
UMoE: Unifying Attention and FFN with Shared Experts
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
305
1
0
12 May 2025
SEE: Continual Fine-tuning with Sequential Ensemble of Experts
SEE: Continual Fine-tuning with Sequential Ensemble of ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhilin Wang
Yafu Li
Xiaoye Qu
Yu Cheng
CLLKELM
319
2
0
09 Apr 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLMOffRLLRM
728
117
0
27 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
1.2K
64
0
10 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu Cheng
MoE
556
7
0
07 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu Cheng
462
13
0
03 Mar 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Xiaoye Qu
Wei Wei
Yu Cheng
MoE
1.1K
12
0
24 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
609
21
0
19 Feb 2025
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
Jihai Zhang
Xiaoye Qu
Tong Zhu
Yu Cheng
680
18
0
28 Sep 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
304
19
0
17 Jun 2024
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment
Zhili Liu
Yunhao Gou
Kai Chen
Lanqing Hong
Lei Li
...
Yu Zhang
Zhenguo Li
Xin Jiang
Qiang Liu
James T. Kwok
MoE
706
13
0
01 May 2024
1
Page 1 of 1