Upcycling Large Language Models into Mixture of Experts

10 October 2024

Papers citing "Upcycling Large Language Models into Mixture of Experts"

8 / 8 papers shown

Title
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers Tianyu Chen Haoyi Zhou Y. Li Hao Wang Z. Zhang Tianchen Zhu Shanghang Zhang J. Li 16 0 0 11 May 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core Dennis Liu Zijie Yan Xin Yao Tong Liu V. Korthikanti ... Jiajie Yao Chandler Zhou David Wu Xipeng Li J. Yang MoE 56 0 0 21 Apr 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression Guihong Li Mehdi Rezagholizadeh Mingyu Yang Vikram Appia Emad Barsoum VLM 55 0 0 14 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models Y. Huang Peng Ye Chenyu Huang Jianjian Cao Lin Zhang Baopu Li Gang Yu Tao Chen MoMe MoE 53 0 0 03 Mar 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization Taishi Nakamura Takuya Akiba Kazuki Fujii Yusuke Oda Rio Yokota Jun Suzuki MoMe MoE 75 1 0 26 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Chenghao Fan Zhenyi Lu Sichen Liu Xiaoye Qu Wei Wei Chengfeng Gu Yu-Xi Cheng MoE 82 0 0 24 Feb 2025
The Race to Efficiency: A New Perspective on AI Scaling Laws Chien-Ping Lu 28 1 0 04 Jan 2025
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Xiaoye Qu Daize Dong Xuyang Hu Tong Zhu Weigao Sun Yu-Xi Cheng MoE 85 10 0 24 Nov 2024