Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2507.17702
Cited By
v1
v2
v3
v4 (latest)
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
23 July 2025
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models"
6 / 6 papers shown
Title
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Quentin G. Anthony
Yury Tokpanov
Skyler Szot
Srivatsan Rajagopal
Praneeth Medepalli
...
Emad Barsoum
Zhenyu Gu
Yao Fu
Beren Millidge
Beren Millidge
MoE
VLM
LRM
173
0
0
21 Nov 2025
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Ling Team
Bin Han
Caizhi Tang
Chen Liang
Donghao Zhang
...
Yue Zhang
Yuchen Fang
Zibin Lin
Zixuan Cheng
Jun Zhou
LRM
149
1
0
22 Oct 2025
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo
Wenzhen Zheng
Qingfu Zhu
Rongyi Zhang
Houyi Li
Siming Huang
YuanTao Fan
Wanxiang Che
ALM
88
1
0
09 Oct 2025
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Mary Llewellyn
Annie Gray
Josh Collyer
Michael Harries
84
0
0
07 Oct 2025
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Peter Ochieng
60
1
0
07 Oct 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
56
0
0
30 Sep 2025
1