ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.17702
  4. Cited By
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
v1v2v3v4 (latest)

Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models

23 July 2025
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
    MoE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models"

6 / 6 papers shown
Title
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Quentin G. Anthony
Yury Tokpanov
Skyler Szot
Srivatsan Rajagopal
Praneeth Medepalli
...
Emad Barsoum
Zhenyu Gu
Yao Fu
Beren Millidge
Beren Millidge
MoEVLMLRM
173
0
0
21 Nov 2025
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
Ling Team
Bin Han
Caizhi Tang
Chen Liang
Donghao Zhang
...
Yue Zhang
Yuchen Fang
Zibin Lin
Zixuan Cheng
Jun Zhou
LRM
149
1
0
22 Oct 2025
Scaling Laws for Code: A More Data-Hungry Regime
Scaling Laws for Code: A More Data-Hungry Regime
Xianzhen Luo
Wenzhen Zheng
Qingfu Zhu
Rongyi Zhang
Houyi Li
Siming Huang
YuanTao Fan
Wanxiang Che
ALM
88
1
0
09 Oct 2025
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling
Mary Llewellyn
Annie Gray
Josh Collyer
Michael Harries
84
0
0
07 Oct 2025
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Diversity Is All You Need for Contrastive Learning: Spectral Bounds on Gradient Magnitudes
Peter Ochieng
60
1
0
07 Oct 2025
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
Yaoxiang Wang
Qingguo Hu
Yucheng Ding
Ruizhe Wang
Yeyun Gong
Jian Jiao
Yelong Shen
Peng Cheng
Jinsong Su
MoE
56
0
0
30 Sep 2025
1