Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.06084
Cited By
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
13 August 2021
Conglong Li
Minjia Zhang
Yuxiong He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models"
8 / 8 papers shown
Title
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
136
0
0
03 Mar 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
109
0
0
30 Dec 2024
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
24
39
0
07 Apr 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
219
88
0
31 Dec 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
216
188
0
03 May 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
1