Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

6 November 2023

Papers citing "Navigating Scaling Laws: Compute Optimality in Adaptive Model Training"

6 / 6 papers shown

Title
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Fuzhao Xue Yao Fu Wangchunshu Zhou Zangwei Zheng Yang You 79 74 0 22 May 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining Mannat Singh Quentin Duval Kalyan Vasudev Alwala Haoqi Fan Vaibhav Aggarwal ... Piotr Dollár Christoph Feichtenhofer Ross B. Girshick Rohit Girdhar Ishan Misra LRM 99 62 0 23 Mar 2023
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 234 690 0 27 Aug 2021
ImageNet-21K Pretraining for the Masses T. Ridnik Emanuel Ben-Baruch Asaf Noy Lihi Zelnik-Manor SSeg VLM CLIP 154 676 0 22 Apr 2021
Curriculum Learning: A Survey Petru Soviany Radu Tudor Ionescu Paolo Rota N. Sebe ODL 63 251 0 25 Jan 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020