Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.18392
Cited By
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
28 May 2024
Alexander Hägele
Elie Bakouch
Atli Kosson
Loubna Ben Allal
Leandro von Werra
Martin Jaggi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
11 / 11 papers shown
Title
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
25
0
0
21 Apr 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
32
3
0
10 Jan 2025
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Hadrien Hendrikx
104
12
0
27 Nov 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
44
19
0
27 Jun 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
91
40
0
13 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
51
116
0
29 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
34
33
0
01 Feb 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
133
298
0
05 Jan 2024
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
181
89
0
22 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1