ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.01169
  4. Cited By
Unified Scaling Laws for Routed Language Models

Unified Scaling Laws for Routed Language Models

2 February 2022
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
Jordan Hoffmann
Bogdan Damoc
Blake A. Hechtman
Trevor Cai
Sebastian Borgeaud
George van den Driessche
Eliza Rutherford
Tom Hennigan
Matthew J. Johnson
Katie Millican
Albin Cassirer
Chris Jones
Elena Buchatskaya
David Budden
Laurent Sifre
Simon Osindero
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
    MoE
ArXivPDFHTML

Papers citing "Unified Scaling Laws for Routed Language Models"

50 / 146 papers shown
Title
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Y. Li
Haojie Wang
Biao Hou
Jidong Zhai
33
0
0
12 May 2025
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
Enric Boix Adserà
Philippe Rigollet
MoE
16
0
0
11 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
94
0
0
21 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
43
0
0
03 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
38
0
0
02 Apr 2025
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Y. Li
Bo Liu
Sheng Huang
Z. Zhang
Xiaotong Yuan
Richang Hong
41
0
0
31 Mar 2025
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai
Qingjie Liu
Y. Wang
MoE
60
0
0
24 Mar 2025
Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities
Upcycling Text-to-Image Diffusion Models for Multi-Task Capabilities
Ruchika Chavhan
Abhinav Mehrotra
Malcolm Chadwick
Alberto Gil C. P. Ramos
Luca Morreale
Mehdi Noroozi
Sourav Bhattacharya
44
0
0
14 Mar 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
48
0
0
13 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
Liang Lin
LRM
54
2
0
08 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
C. Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
109
2
0
07 Mar 2025
Fractional Correspondence Framework in Detection Transformer
Masoumeh Zareapoor
Pourya Shamsolmoali
Huiyu Zhou
Yue Lu
Salvador García
50
0
0
06 Mar 2025
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
S.
Eugene Belilovsky
Irina Rish
MoE
73
0
0
06 Mar 2025
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Yunyang Li
Zaishuo Xia
Lin Huang
Xinran Wei
Han Yang
...
Zun Wang
Chang-Shu Liu
Jia Zhang
Bin Shao
Mark B. Gerstein
68
0
0
26 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
69
2
0
26 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
41
0
0
17 Feb 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
33
5
0
28 Jan 2025
The Race to Efficiency: A New Perspective on AI Scaling Laws
The Race to Efficiency: A New Perspective on AI Scaling Laws
Chien-Ping Lu
36
1
0
04 Jan 2025
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Jingbo Lin
Zhilu Zhang
W. J. Li
Renjing Pei
Hang Xu
Hongzhi Zhang
Wangmeng Zuo
32
0
0
28 Dec 2024
Scaling Sequential Recommendation Models with Transformers
Scaling Sequential Recommendation Models with Transformers
Pablo Zivic
Hernán Ceferino Vázquez
Jorge Sanchez
OffRL
LRM
79
1
0
10 Dec 2024
MH-MoE: Multi-Head Mixture-of-Experts
MH-MoE: Multi-Head Mixture-of-Experts
Shaohan Huang
Xun Wu
Shuming Ma
Furu Wei
MoE
69
1
0
25 Nov 2024
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive
  Hashing
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
Xiaonan Nie
Qibin Liu
Fangcheng Fu
Shenhan Zhu
Xupeng Miao
X. Li
Y. Zhang
Shouda Liu
Bin Cui
MoE
23
1
0
13 Nov 2024
Scaling Laws for Precision
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Training Compute-Optimal Protein Language Models
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
77
13
0
04 Nov 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
31
3
0
18 Oct 2024
Exploring the Design Space of Visual Context Representation in Video
  MLLMs
Exploring the Design Space of Visual Context Representation in Video MLLMs
Yifan Du
Yuqi Huo
K. Zhou
Zijia Zhao
Haoyu Lu
Han Huang
Wayne Xin Zhao
B. Wang
Weipeng Chen
Ji-Rong Wen
33
2
0
17 Oct 2024
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for
  Autonomous Driving
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving
Sihao Wu
Jiaxu Liu
Xiangyu Yin
Guangliang Cheng
Xingyu Zhao
Meng Fang
Xinping Yi
Xiaowei Huang
26
0
0
16 Oct 2024
MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and
  Mapping With a Dynamic and Static Object Discriminator
MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping With a Dynamic and Static Object Discriminator
Taozhe Li
Wei Sun
29
0
0
14 Oct 2024
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
J. Liu
MoE
23
0
0
14 Oct 2024
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Jun Luo
C. L. P. Chen
Shandong Wu
FedML
VLM
MoE
36
3
0
14 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
43
8
0
11 Oct 2024
Data Efficiency for Large Recommendation Models
Data Efficiency for Large Recommendation Models
Kshitij Jain
Jingru Xie
Kevin Regan
Cheng Chen
Jie Han
...
Todd Phillips
Myles Sussman
Matt Troup
Angel Yu
Jia Zhuo
OffRL
18
0
0
08 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense
  and MoE Models in Large Language Models
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
23
2
0
08 Oct 2024
Unified Neural Network Scaling Laws and Scale-time Equivalence
Unified Neural Network Scaling Laws and Scale-time Equivalence
Akhilan Boopathy
Ila Fiete
30
0
0
09 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
34
2
0
09 Sep 2024
Understanding the Interplay of Scale, Data, and Bias in Language Models:
  A Case Study with BERT
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali
Swetasudha Panda
Qinlan Shen
Michael Wick
Ari Kobren
MILM
34
3
0
25 Jul 2024
Mixture of A Million Experts
Mixture of A Million Experts
Xu Owen He
MoE
31
25
0
04 Jul 2024
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoE
24
0
0
28 Jun 2024
Scaling Laws for Linear Complexity Language Models
Scaling Laws for Linear Complexity Language Models
Xuyang Shen
Dong Li
Ruitao Leng
Zhen Qin
Weigao Sun
Yiran Zhong
LRM
33
6
0
24 Jun 2024
Scaling Laws for Fact Memorization of Large Language Models
Scaling Laws for Fact Memorization of Large Language Models
Xingyu Lu
Xiaonan Li
Qinyuan Cheng
Kai Ding
Xuanjing Huang
Xipeng Qiu
27
9
0
22 Jun 2024
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Zhenyi Lu
Chenghao Fan
Wei Wei
Xiaoye Qu
Dangyang Chen
Yu Cheng
MoMe
42
48
0
17 Jun 2024
Towards an Improved Understanding and Utilization of Maximum Manifold
  Capacity Representations
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
Rylan Schaeffer
Victor Lecomte
Dhruv Pai
Andres Carranza
Berivan Isik
...
Yann LeCun
SueYeon Chung
Andrey Gromov
Ravid Shwartz-Ziv
Sanmi Koyejo
41
5
0
13 Jun 2024
Reconciling Kaplan and Chinchilla Scaling Laws
Reconciling Kaplan and Chinchilla Scaling Laws
Tim Pearce
Jinyeop Song
32
7
0
12 Jun 2024
Submodular Framework for Structured-Sparse Optimal Transport
Submodular Framework for Structured-Sparse Optimal Transport
Piyushi Manupriya
Pratik Jawanpuria
Karthik S. Gurumoorthy
SakethaNath Jagarlapudi
Bamdev Mishra
OT
89
0
0
07 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
35
5
0
07 Jun 2024
Scaling and evaluating sparse autoencoders
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
36
112
0
06 Jun 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large
  Language Models
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Haoran Que
Jiaheng Liu
Ge Zhang
Chenchen Zhang
Xingwei Qu
...
Jie Fu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
CLL
36
13
0
03 Jun 2024
123
Next