Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.05661
Cited By
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
8 October 2024
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models"
1 / 1 papers shown
Title
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
36
5
0
28 Jan 2025
1