Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.00448
Cited By
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
31 December 2023
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws"
50 / 55 papers shown
Title
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRM
ELM
43
0
0
08 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Compute-Optimal LLMs Provably Generalize Better With Scale
Marc Finzi
Sanyam Kapoor
Diego Granziol
Anming Gu
Christopher De Sa
J. Zico Kolter
Andrew Gordon Wilson
21
0
0
21 Apr 2025
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano
Sho Takase
Sosuke Kobayashi
Shun Kiyono
Jun Suzuki
46
0
0
01 Apr 2025
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
Indraneil Paul
Haoyi Yang
Goran Glavas
Kristian Kersting
Iryna Gurevych
AAML
SyDa
34
0
0
27 Mar 2025
Cost-Optimal Grouped-Query Attention for Long-Context LLMs
Y. Chen
Yutong Wu
Xu Han
Zhiyuan Liu
Maosong Sun
59
0
0
12 Mar 2025
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary B. Charles
Gabriel Teston
Lucio Dery
Keith Rush
Nova Fallen
Zachary Garrett
Arthur Szlam
Arthur Douillard
50
0
0
12 Mar 2025
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
61
2
0
26 Feb 2025
Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
António Farinhas
Nuno M. Guerreiro
Sweta Agrawal
Ricardo Rei
André F. T. Martins
43
0
0
18 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
36
0
0
17 Feb 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
49
5
0
24 Jan 2025
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja
Nikhil Shivakumar Nayak
Hao Wang
Krishnateja Killamsetty
Shivchander Sudalairaj
...
Guangxuan Xu
Kai Xu
Ligong Han
Luke Inglis
Akash Srivastava
78
6
0
17 Dec 2024
Sparse Upcycling: Inference Inefficient Finetuning
Sasha Doubov
Nikhil Sardana
Vitaliy Chiley
MoE
37
0
0
13 Nov 2024
Safety case template for frontier AI: A cyber inability argument
Arthur Goemans
Marie Davidsen Buhl
Jonas Schuett
Tomek Korbak
Jessica Wang
Benjamin Hilton
Geoffrey Irving
47
15
0
12 Nov 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
32
0
0
12 Nov 2024
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance
Hong Meng Yam
Nathan J Paek
36
1
0
11 Nov 2024
Scaling Laws for Pre-training Agents and World Models
Tim Pearce
Tabish Rashid
Dave Bignell
Raluca Georgescu
Sam Devlin
Katja Hofmann
LM&Ro
34
7
0
07 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
35
12
0
07 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
48
5
0
22 Oct 2024
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao
Chao-Han Huck Yang
Renhe Jiang
Yuxuan Liang
Ming Jin
Shirui Pan
AI4TS
AI4CE
32
6
0
16 Oct 2024
A Hitchhiker's Guide to Scaling Law Estimation
Leshem Choshen
Yang Zhang
Jacob Andreas
28
6
0
15 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Y. Li
Pengfei Liu
VLM
35
67
0
08 Oct 2024
How Much Can We Forget about Data Contamination?
Sebastian Bordt
Suraj Srinivas
Valentyn Boreiko
U. V. Luxburg
36
1
0
04 Oct 2024
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Alessandro Sordoni
Daniel Toyama
Aaron C. Courville
Rishabh Agarwal
LRM
31
11
0
02 Oct 2024
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
41
4
0
30 Sep 2024
Inference-Friendly Models With MixAttention
Shashank Rajput
Ying Sheng
Sean Owen
Vitaliy Chiley
74
1
0
23 Sep 2024
Scaling Law Hypothesis for Multimodal Model
Qingyun Sun
Zhen Guo
30
0
0
10 Sep 2024
Performance Law of Large Language Models
Chuhan Wu
Ruiming Tang
LRM
32
2
0
19 Aug 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
47
422
0
06 Aug 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min-Bin Lin
Ngai Wong
PILM
50
40
0
18 Jul 2024
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
30
7
0
12 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
44
19
0
27 Jun 2024
Data-Centric AI in the Age of Large Language Models
Xinyi Xu
Zhaoxuan Wu
Rui Qiao
Arun Verma
Yao Shu
...
Xiaoqiang Lin
Wenyang Hu
Zhongxiang Dai
Pang Wei Koh
Bryan Kian Hsiang Low
ALM
34
2
0
20 Jun 2024
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
Rylan Schaeffer
Victor Lecomte
Dhruv Pai
Andres Carranza
Berivan Isik
...
Yann LeCun
SueYeon Chung
Andrey Gromov
Ravid Shwartz-Ziv
Sanmi Koyejo
33
5
0
13 Jun 2024
Making Task-Oriented Dialogue Datasets More Natural by Synthetically Generating Indirect User Requests
Amogh Mannekote
Jinseok Nam
Ziming Li
Jian Gao
K. Boyer
Bonnie J. Dorr
38
1
0
12 Jun 2024
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
30
33
0
28 May 2024
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
25
12
0
17 May 2024
More Compute Is What You Need
Zhen Guo
33
0
0
30 Apr 2024
Chinchilla Scaling: A replication attempt
T. Besiroglu
Ege Erdil
Matthew Barnett
Josh You
32
18
0
15 Apr 2024
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
43
8
0
11 Apr 2024
Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic
Sachin Goyal
Pratyush Maini
Zachary Chase Lipton
Aditi Raghunathan
J. Zico Kolter
35
40
0
10 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
29
275
0
09 Apr 2024
Toward Inference-optimal Mixture-of-Expert Large Language Models
Longfei Yun
Yonghao Zhuang
Yao Fu
Eric P. Xing
Hao Zhang
MoE
57
5
0
03 Apr 2024
Poro 34B and the Blessing of Multilinguality
Risto Luukkonen
Jonathan Burdge
Elaine Zosa
Aarne Talman
Ville Komulainen
Vaino Hatanpaa
Peter Sarlin
S. Pyysalo
AI4CE
34
12
0
02 Apr 2024
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli
Armin W. Thomas
Eric N. D. Nguyen
Pragaash Ponnusamy
Bjorn Deiseroth
...
Brian Hie
Stefano Ermon
Christopher Ré
Ce Zhang
Stefano Massaroli
MoE
49
19
0
26 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
33
79
0
26 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
88
40
0
13 Mar 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
100
490
0
07 Mar 2024
Scaling Laws for Fine-Grained Mixture of Experts
Jakub Krajewski
Jan Ludziejewski
Kamil Adamczewski
Maciej Pióro
Michal Krutul
...
Krystian Król
Tomasz Odrzygó'zd'z
Piotr Sankowski
Marek Cygan
Sebastian Jaszczur
MoE
27
53
0
12 Feb 2024
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
36
8
0
02 Feb 2024
1
2
Next