Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.07524
Cited By
Upcycling Large Language Models into Mixture of Experts
10 October 2024
Ethan He
Abhinav Khattar
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
M. Shoeybi
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Upcycling Large Language Models into Mixture of Experts"
8 / 8 papers shown
Title
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers
Tianyu Chen
Haoyi Zhou
Y. Li
Hao Wang
Z. Zhang
Tianchen Zhu
Shanghang Zhang
J. Li
16
0
0
11 May 2025
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Dennis Liu
Zijie Yan
Xin Yao
Tong Liu
V. Korthikanti
...
Jiajie Yao
Chandler Zhou
David Wu
Xipeng Li
J. Yang
MoE
56
0
0
21 Apr 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
55
0
0
14 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Y. Huang
Peng Ye
Chenyu Huang
Jianjian Cao
Lin Zhang
Baopu Li
Gang Yu
Tao Chen
MoMe
MoE
53
0
0
03 Mar 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura
Takuya Akiba
Kazuki Fujii
Yusuke Oda
Rio Yokota
Jun Suzuki
MoMe
MoE
75
1
0
26 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Wei Wei
Chengfeng Gu
Yu-Xi Cheng
MoE
82
0
0
24 Feb 2025
The Race to Efficiency: A New Perspective on AI Scaling Laws
Chien-Ping Lu
28
1
0
04 Jan 2025
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Xiaoye Qu
Daize Dong
Xuyang Hu
Tong Zhu
Weigao Sun
Yu-Xi Cheng
MoE
85
10
0
24 Nov 2024
1