Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.03641
Cited By
Exploring the limits of Concurrency in ML Training on Google TPUs
7 November 2020
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
Blake A. Hechtman
Dehao Chen
HyoukJoong Lee
Mehmet Deveci
Naveen Kumar
Pankaj Kanwar
Shibo Wang
Skye Wanderman-Milne
Steve Lacy
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDL
AIMat
MoE
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring the limits of Concurrency in ML Training on Google TPUs"
11 / 11 papers shown
Title
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
19
1
0
24 Sep 2023
RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation
Geet Sethi
Bilge Acun
Niket Agarwal
Christos Kozyrakis
Caroline Trippel
Carole-Jean Wu
47
66
0
25 Jan 2022
Efficient Strong Scaling Through Burst Parallel Training
S. Park
Joshua Fried
Sunghyun Kim
Mohammad Alizadeh
Adam Belay
GNN
LRM
18
10
0
19 Dec 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
21
36
0
28 Oct 2021
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Mark Zhao
Niket Agarwal
Aarti Basant
B. Gedik
Satadru Pan
...
Kevin Wilfong
Harsha Rastogi
Carole-Jean Wu
Christos Kozyrakis
Parikshit Pol
GNN
15
70
0
20 Aug 2021
Concurrent Adversarial Learning for Large-Batch Training
Yong Liu
Xiangning Chen
Minhao Cheng
Cho-Jui Hsieh
Yang You
ODL
28
13
0
01 Jun 2021
Demystifying BERT: Implications for Accelerator Design
Suchita Pati
Shaizeen Aga
Nuwan Jayasena
Matthew D. Sinclair
LLMAG
24
17
0
14 Apr 2021
An Efficient 2D Method for Training Super-Large Deep Learning Models
Qifan Xu
Shenggui Li
Chaoyu Gong
Yang You
17
0
0
12 Apr 2021
Srifty: Swift and Thrifty Distributed Training on the Cloud
Liangchen Luo
Peter West
Arvind Krishnamurthy
Luis Ceze
22
11
0
29 Nov 2020
Progressive Compressed Records: Taking a Byte out of Deep Learning Data
Michael Kuchnik
George Amvrosiadis
Virginia Smith
11
9
0
01 Nov 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1