Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.13184
Cited By
TorchScale: Transformers at Scale
23 November 2022
Shuming Ma
Hongyu Wang
Shaohan Huang
Wenhui Wang
Zewen Chi
Li Dong
Alon Benhaim
Barun Patra
Vishrav Chaudhary
Xia Song
Furu Wei
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TorchScale: Transformers at Scale"
4 / 4 papers shown
Title
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
78
301
0
17 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
41
151
0
05 Jul 2023
A Length-Extrapolatable Transformer
Yutao Sun
Li Dong
Barun Patra
Shuming Ma
Shaohan Huang
Alon Benhaim
Vishrav Chaudhary
Xia Song
Furu Wei
30
115
0
20 Dec 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1