SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

7 April 2024

Papers citing "SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts"

2 / 2 papers shown

Title
Faster MoE LLM Inference for Extremely Large Models Haoqi Yang Luohe Shi Qiwei Li Zuchao Li Ping Wang Bo Du Mengjia Shen Hai Zhao MoE 54 0 0 06 May 2025
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020