ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.01351
28
0

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model

2 July 2025
Chaoxiang Cai
Longrong Yang
Kaibing Chen
Fan Yang
Xi Li
    MoEVLM
ArXiv (abs)PDFHTML
Main:12 Pages
5 Figures
Bibliography:1 Pages
12 Tables
Appendix:6 Pages
Abstract

The mixture-of-experts (MoE), which replaces dense models with sparse architectures, has gained attention in large vision-language models (LVLMs) for achieving comparable performance with fewer activated parameters. Existing MoE frameworks for LVLMs focus on token-to-expert routing (TER), encouraging different experts to specialize in processing distinct tokens. However, these frameworks often rely on the load balancing mechanism, overlooking the inherent distributional differences between vision and language. To this end, we propose a Long-Tailed Distribution-aware Router (LTDR) for vision-language TER, tackling two challenges: (1) Distribution-aware router for modality-specific routing. We observe that language TER follows a uniform distribution, whereas vision TER exhibits a long-tailed distribution. This discrepancy necessitates distinct routing strategies tailored to each modality. (2) Enhancing expert activation for vision tail tokens. Recognizing the importance of vision tail tokens, we introduce an oversampling-like strategy by increasing the number of activated experts for these tokens. Experiments on extensive benchmarks validate the effectiveness of our approach.

View on arXiv
@article{cai2025_2507.01351,
  title={ Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model },
  author={ Chaoxiang Cai and Longrong Yang and Kaibing Chen and Fan Yang and Xi Li },
  journal={arXiv preprint arXiv:2507.01351},
  year={ 2025 }
}
Comments on this paper