ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.08520
  4. Cited By
Scaling Laws for Sparsely-Connected Foundation Models

Scaling Laws for Sparsely-Connected Foundation Models

International Conference on Learning Representations (ICLR), 2023
15 September 2023
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
ArXiv (abs)PDFHTMLHuggingFace (14 upvotes)

Papers citing "Scaling Laws for Sparsely-Connected Foundation Models"

25 / 25 papers shown
Title
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer
Jing-Zong Zhang
Shuang Guo
Li-Lin Zhu
Lingxiao Wang
Guo-Liang Ma
104
10
0
08 Oct 2025
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Taishi Nakamura
Satoki Ishikawa
Masaki Kawamura
Takumi Okamoto
Daisuke Nohara
Jun Suzuki
Rio Yokota
MoELRM
111
0
0
26 Aug 2025
Cost-Aware Contrastive Routing for LLMs
Cost-Aware Contrastive Routing for LLMs
Reza Shirkavand
Shangqian Gao
Qi He
Heng-Chiao Huang
191
1
0
17 Aug 2025
Generalizing Scaling Laws for Dense and Sparse Large Language Models
Generalizing Scaling Laws for Dense and Sparse Large Language Models
Md Arafat Hossain
Xingfu Wu
V. Taylor
Ali Jannesari
110
0
0
08 Aug 2025
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman
Michael Krumdick
A. Lynn Abbott
229
0
0
15 Jun 2025
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto L. Castro
Andrei Panferov
Soroush Tabesh
Oliver Sieberling
Jiale Chen
Mahdi Nikdan
Saleh Ashkboos
Dan Alistarh
MQ
236
6
0
20 May 2025
UMoE: Unifying Attention and FFN with Shared Experts
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
211
0
0
12 May 2025
The Neural Pruning Law Hypothesis
The Neural Pruning Law Hypothesis
Eugen Barbulescu
Antonio Alexoaie
Lucian Busoniu
275
0
0
06 Apr 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesInternational Conference on Learning Representations (ICLR), 2025
Kairong Luo
Haodong Wen
Shengding Hu
Zhenbo Sun
Zhiyuan Liu
Maosong Sun
Kaifeng Lyu
Wenguang Chen
CLL
223
11
0
17 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
626
18
0
08 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Emmy Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
Siyang Song
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
487
6
0
05 Mar 2025
(Mis)Fitting: A Survey of Scaling Laws
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
344
11
0
26 Feb 2025
Factual Inconsistency in Data-to-Text Generation Scales Exponentially with LLM Size: A Statistical Validation
Factual Inconsistency in Data-to-Text Generation Scales Exponentially with LLM Size: A Statistical Validation
Joy Mahapatra
Soumyajit Roy
Utpal Garain
HILMALM
243
0
0
17 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
407
4
0
17 Feb 2025
Physics of Skill Learning
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
280
2
0
21 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoELRM
442
24
0
21 Jan 2025
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Yuxiao Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
Jiansheng Wei
Zhiyuan Liu
Maosong Sun
495
14
0
04 Nov 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
280
24
0
11 Oct 2024
Scaling Optimal LR Across Token Horizons
Scaling Optimal LR Across Token HorizonsInternational Conference on Learning Representations (ICLR), 2024
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
444
15
0
30 Sep 2024
DAM: Towards A Foundation Model for Time Series Forecasting
DAM: Towards A Foundation Model for Time Series Forecasting
L. N. Darlow
Qiwen Deng
Ahmed Hassan
Martin Asenov
Rajkarn Singh
Artjom Joosen
Adam Barker
Amos Storkey
AI4TSAI4CE
162
6
0
25 Jul 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
293
18
0
31 May 2024
On the Scalability of GNNs for Molecular Graphs
On the Scalability of GNNs for Molecular Graphs
Maciej Sypetkowski
Frederik Wenkel
Farimah Poursafaei
Nia Dickson
Karush Suri
Philip Fradkin
Dominique Beaini
GNNAI4CE
379
32
0
17 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
278
101
0
25 Mar 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware MinimizationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
664
2
0
29 Nov 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
421
624
0
20 Jun 2023
1