ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2508.07785
112
4

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

11 August 2025
Haoyuan Wu
Haoxing Chen
Xiaodong Chen
Zhanchao Zhou
Tieyuan Chen
Yihong Zhuang
Guoshan Lu
Zenan Huang
Junbo Zhao
Lin Liu
Zhenzhong Lan
Bei Yu
Jianguo Li
    MoE
ArXiv (abs)PDFHTMLHuggingFace (22 upvotes)Github (22★)
Main:9 Pages
4 Figures
Bibliography:4 Pages
4 Tables
Abstract

The Mixture of Experts (MoE) architecture is a cornerstone of modern state-of-the-art (SOTA) large language models (LLMs). MoE models facilitate scalability by enabling sparse parameter activation. However, traditional MoE architecture uses homogeneous experts of a uniform size, activating a fixed number of parameters irrespective of input complexity and thus limiting computational efficiency. To overcome this limitation, we introduce Grove MoE, a novel architecture incorporating experts of varying sizes, inspired by the heterogeneousthis http URLCPU architecture. This architecture features novel adjugate experts with a dynamic activation mechanism, enabling model capacity expansion while maintaining manageable computational overhead. Building on this architecture, we present GroveMoE-Base and GroveMoE-Inst, 33B-parameter LLMs developed by applying an upcycling strategy to the Qwen3-30B-A3B-Base model during mid-training and post-training. GroveMoE models dynamically activate 3.14-3.28B parameters based on token complexity and achieve performance comparable to SOTA open-source models of similar or even larger size.

View on arXiv
Comments on this paper