Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08053
Cited By
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
17 September 2019
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism"
2 / 2 papers shown
Title
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
206
6,131
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
192
2,696
0
15 Sep 2016
1