Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.02790
Cited By
BranchNorm: Robustly Scaling Extremely Deep Transformers
4 May 2023
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BranchNorm: Robustly Scaling Extremely Deep Transformers"
3 / 3 papers shown
Title
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie M. Zhang
Yong Li
Wei Lin
Jingren Zhou
Hongxia Yang
MoE
84
43
0
08 Oct 2021
Larger-Scale Transformers for Multilingual Masked Language Modeling
Naman Goyal
Jingfei Du
Myle Ott
Giridhar Anantharaman
Alexis Conneau
88
126
0
02 May 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
1