Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.12469
Cited By
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
29 January 2022
Minjia Zhang
U. Niranjan
Yuxiong He
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise"
6 / 6 papers shown
Title
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
228
4,460
0
23 Jan 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
223
437
0
25 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
196
1,019
0
26 Mar 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
278
2,888
0
15 Sep 2016
1