Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.15319
Cited By
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
24 May 2024
Wenyu Du
Tongxu Luo
Zihan Qiu
Zeyu Huang
Yikang Shen
Reynold Cheng
Yike Guo
Jie Fu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training"
4 / 4 papers shown
Title
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao
Zheng-Wei Zhang
Jing Li
Yequan Wang
OffRL
AI4CE
LRM
34
9
0
04 May 2023
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Lemeng Wu
Bo Liu
Peter Stone
Qiang Liu
41
45
0
17 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
231
1,508
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
217
3,054
0
23 Jan 2020
1