ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.09756
  4. Cited By
Scale MLPerf-0.6 models on Google TPU-v3 Pods
v1v2v3 (latest)

Scale MLPerf-0.6 models on Google TPU-v3 Pods

21 September 2019
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
HyoukJoong Lee
Naveen Kumar
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
ArXiv (abs)PDFHTML

Papers citing "Scale MLPerf-0.6 models on Google TPU-v3 Pods"

26 / 26 papers shown
Title
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
130
65
0
08 Apr 2024
Self-Influence Guided Data Reweighting for Language Model Pre-training
Self-Influence Guided Data Reweighting for Language Model Pre-training
Megh Thakkar
Tolga Bolukbasi
Sriram Ganapathy
Shikhar Vashishth
Sarath Chandar
Partha P. Talukdar
MILM
109
26
0
02 Nov 2023
TPU as Cryptographic Accelerator
TPU as Cryptographic Accelerator
Rabimba Karanjai
Sangwon Shin
Xinxin Fan
Lin Chen
Tianwei Zhang
Taeweon Suh
W. Shi
Lei Xu
66
1
0
13 Jul 2023
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
  Languages
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Sebastian Ruder
J. Clark
Alexander Gutkin
Mihir Kale
Min Ma
...
Dan Garrette
R. Ingle
Melvin Johnson
Dmitry Panteleev
Partha P. Talukdar
ELM
85
40
0
19 May 2023
RF-Photonic Deep Learning Processor with Shannon-Limited Data Movement
RF-Photonic Deep Learning Processor with Shannon-Limited Data Movement
R. Davis
Zaijun Chen
R. Hamerly
Dirk Englund
MQ
72
6
0
08 Jul 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10
  minutes on 1 GPU
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
80
7
0
13 Apr 2022
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized
  Stochastic Gradient Descent
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Wei Zhang
Mingrui Liu
Yu Feng
Xiaodong Cui
Brian Kingsbury
Yuhai Tu
50
3
0
02 Dec 2021
Asynchronous Decentralized Distributed Training of Acoustic Models
Asynchronous Decentralized Distributed Training of Acoustic Models
Xiaodong Cui
Wei Zhang
Abdullah Kayi
Mingrui Liu
Ulrich Finkler
Brian Kingsbury
G. Saon
David S. Kung
54
3
0
21 Oct 2021
Improving Robustness using Generated Data
Improving Robustness using Generated Data
Sven Gowal
Sylvestre-Alvise Rebuffi
Olivia Wiles
Florian Stimberg
D. A. Calian
Timothy A. Mann
110
302
0
18 Oct 2021
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural
  Networks
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks
J. G. Pauloski
Qi Huang
Lei Huang
Shivaram Venkataraman
Kyle Chard
Ian Foster
Zhao-jie Zhang
86
29
0
04 Jul 2021
A Comparative Study on Neural Architectures and Training Methods for
  Japanese Speech Recognition
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition
Shigeki Karita
Yotaro Kubo
M. Bacchiani
Llion Jones
46
13
0
09 Jun 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
Tesseract: Parallelize the Tensor Parallelism Efficiently
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLMGNN
33
34
0
30 May 2021
Demystifying BERT: Implications for Accelerator Design
Demystifying BERT: Implications for Accelerator Design
Suchita Pati
Shaizeen Aga
Nuwan Jayasena
Matthew D. Sinclair
LLMAG
88
17
0
14 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
163
711
0
09 Apr 2021
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers
  Suffice Across Batch Sizes
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary Nado
Justin M. Gilmer
Christopher J. Shallue
Rohan Anil
George E. Dahl
ODL
88
27
0
12 Feb 2021
Exploring the limits of Concurrency in ML Training on Google TPUs
Exploring the limits of Concurrency in ML Training on Google TPUs
Sameer Kumar
James Bradbury
C. Young
Yu Emma Wang
Anselm Levskaya
...
Tao Wang
Tayo Oguntebi
Yazhou Zu
Yuanzhong Xu
Andy Swing
BDLAIMatMoELRM
64
27
0
07 Nov 2020
Highly Available Data Parallel ML training on Mesh Networks
Highly Available Data Parallel ML training on Mesh Networks
Sameer Kumar
N. Jouppi
MoEAI4CE
37
11
0
06 Nov 2020
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1
  Accuracy in One Hour
Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour
Arissa Wongpanich
Hieu H. Pham
J. Demmel
Mingxing Tan
Quoc V. Le
Yang You
Sameer Kumar
78
8
0
30 Oct 2020
PERF-Net: Pose Empowered RGB-Flow Net
PERF-Net: Pose Empowered RGB-Flow Net
Yinxiao Li
Zhichao Lu
Xuehan Xiong
Jonathan Huang
3DH
81
17
0
28 Sep 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
121
15
0
15 Jun 2020
Automatic Cross-Replica Sharding of Weight Update in Data-Parallel
  Training
Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Hongjun Choi
Blake A. Hechtman
Shibo Wang
74
42
0
28 Apr 2020
Improving 3D Object Detection through Progressive Population Based
  Augmentation
Improving 3D Object Detection through Progressive Population Based Augmentation
Shuyang Cheng
Zhaoqi Leng
E. D. Cubuk
Barret Zoph
Chunyan Bai
...
Vijay Vasudevan
Congcong Li
Quoc V. Le
Jonathon Shlens
Dragomir Anguelov
3DPC
70
75
0
02 Apr 2020
RetinaTrack: Online Single Stage Joint Detection and Tracking
RetinaTrack: Online Single Stage Joint Detection and Tracking
Zhichao Lu
V. Rathod
Ronny Votel
Jonathan Huang
VOT
93
190
0
30 Mar 2020
Learning to Simulate Complex Physics with Graph Networks
Learning to Simulate Complex Physics with Graph Networks
Alvaro Sanchez-Gonzalez
Jonathan Godwin
Tobias Pfaff
Rex Ying
J. Leskovec
Peter W. Battaglia
PINNAI4CE
150
1,107
0
21 Feb 2020
Context R-CNN: Long Term Temporal Context for Per-Camera Object
  Detection
Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
Sara Beery
Guanhang Wu
V. Rathod
Ronny Votel
Jonathan Huang
ObjD
109
116
0
07 Dec 2019
Progressive Compressed Records: Taking a Byte out of Deep Learning Data
Progressive Compressed Records: Taking a Byte out of Deep Learning Data
Michael Kuchnik
George Amvrosiadis
Virginia Smith
75
9
0
01 Nov 2019
1