Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.05449
Cited By
Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods
10 June 2021
Brett Daley
Chris Amato
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods"
1 / 1 papers shown
Title
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
1