Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.09075
Cited By
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
19 April 2021
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks"
4 / 4 papers shown
Title
Optimizing CNN Using HPC Tools
Shahrin Rahman
11
0
0
07 Mar 2024
Monitoring Collective Communication Among GPUs
Muhammet Abdullah Soytürk
Palwisha Akhtar
Erhan Tezcan
D. Unat
GNN
15
1
0
20 Oct 2021
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA
M. Wahib
Haoyu Zhang
Truong Thao Nguyen
Aleksandr Drozd
Jens Domke
Lingqi Zhang
Ryousei Takano
Satoshi Matsuoka
OODD
32
23
0
26 Aug 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
1