ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.02084
  4. Cited By
Mesh-TensorFlow: Deep Learning for Supercomputers

Mesh-TensorFlow: Deep Learning for Supercomputers

5 November 2018
Noam M. Shazeer
Youlong Cheng
Niki Parmar
Dustin Tran
Ashish Vaswani
Gregory J. Puleo
Peter Hawkins
HyoukJoong Lee
O. Milenkovic
C. Young
Ryan Sepassi
Blake Hechtman
    GNN
    MoE
    AI4CE
ArXivPDFHTML

Papers citing "Mesh-TensorFlow: Deep Learning for Supercomputers"

25 / 75 papers shown
Title
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
38
367
0
16 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
37
646
0
09 Apr 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALM
MoE
24
94
0
24 Mar 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
414
0
18 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,075
0
11 Jan 2021
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
  3D Reconstruction
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction
Anzhu Yu
Wenyue Guo
Bing Liu
Xin Chen
Xin Wang
Xuefeng Cao
Bingchuan Jiang
3DV
21
64
0
25 Nov 2020
Integrating Deep Learning in Domain Sciences at Exascale
Integrating Deep Learning in Domain Sciences at Exascale
Rick Archibald
E. Chow
E. DÁzevedo
Jack J. Dongarra
M. Eisenbach
...
Florent Lopez
Daniel Nichols
S. Tomov
Kwai Wong
Junqi Yin
PINN
23
5
0
23 Nov 2020
Learning from Task Descriptions
Learning from Task Descriptions
Orion Weller
Nicholas Lourie
Matt Gardner
Matthew E. Peters
45
89
0
16 Nov 2020
Exploring the limits of Concurrency in ML Training on Google TPUs
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
17
27
0
07 Nov 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Michael Lui
Yavuz Yetim
Özgür Özkan
Zhuoran Zhao
Shin-Yeh Tsai
Carole-Jean Wu
Mark Hempstead
GNN
BDL
LRM
22
51
0
04 Nov 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
Towards a Scalable and Distributed Infrastructure for Deep Learning
  Applications
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications
Bita Hasheminezhad
S. Shirzad
Nanmiao Wu
Patrick Diehl
Hannes Schulz
Hartmut Kaiser
GNN
AI4CE
27
4
0
06 Oct 2020
VirtualFlow: Decoupling Deep Learning Models from the Underlying
  Hardware
VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Andrew Or
Haoyu Zhang
M. Freedman
12
9
0
20 Sep 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs
  with Hybrid Parallelism
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
32
37
0
25 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
22
1,106
0
30 Jun 2020
LAMP: Large Deep Nets with Automated Model Parallelism for Image
  Segmentation
LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation
Wentao Zhu
Can Zhao
Wenqi Li
H. Roth
Ziyue Xu
Daguang Xu
3DV
32
18
0
22 Jun 2020
Bayesian Neural Networks at Scale: A Performance Analysis and Pruning
  Study
Bayesian Neural Networks at Scale: A Performance Analysis and Pruning Study
Himanshu Sharma
Elise Jennings
BDL
27
3
0
23 May 2020
Reducing Communication in Graph Neural Network Training
Reducing Communication in Graph Neural Network Training
Alok Tripathy
Katherine Yelick
A. Buluç
GNN
24
104
0
07 May 2020
Sparse Sinkhorn Attention
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
17
330
0
26 Feb 2020
Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
William W. Cohen
Haitian Sun
R. A. Hofer
M. Siegler
30
61
0
14 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
88
19,440
0
23 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
79
6,375
0
26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedML
BDL
29
29
0
24 Sep 2019
Simple, Scalable Adaptation for Neural Machine Translation
Simple, Scalable Adaptation for Neural Machine Translation
Ankur Bapna
N. Arivazhagan
Orhan Firat
AI4CE
39
407
0
18 Sep 2019
Previous
12