Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.03233
Cited By
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
6 November 2023
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Navigating Scaling Laws: Compute Optimality in Adaptive Model Training"
6 / 6 papers shown
Title
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
79
74
0
22 May 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
99
62
0
23 Mar 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
154
676
0
22 Apr 2021
Curriculum Learning: A Survey
Petru Soviany
Radu Tudor Ionescu
Paolo Rota
N. Sebe
ODL
63
251
0
25 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1