Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16423
Cited By
Efficient Algorithms for Device Placement of DNN Graph Operators
29 June 2020
Jakub Tarnawski
Amar Phanishayee
Nikhil R. Devanur
Divya Mahajan
Fanny Nina Paravecino
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Algorithms for Device Placement of DNN Graph Operators"
16 / 16 papers shown
Title
Benchmarking Ultra-Low-Power
μ
μ
μ
NPUs
Josh Millar
Yushan Huang
Sarab Sethi
Hamed Haddadi
Anil Madhavapeddy
BDL
56
0
0
28 Mar 2025
Optimizing Large Model Training through Overlapped Activation Recomputation
Ping Chen
Wenjie Zhang
Shuibing He
Yingjie Gu
Zhuwei Peng
...
Yi Zheng
Zhefeng Wang
Yanlong Yin
Gang Chen
Gang Chen
35
5
0
13 Jun 2024
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
Muhammad Adnan
Amar Phanishayee
Janardhan Kulkarni
Prashant J. Nair
Divyat Mahajan
37
0
0
23 Apr 2024
MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms
Jiaang Duan
Shiyou Qian
Dingyu Yang
Hanwen Hu
Jian Cao
Guangtao Xue
MoE
31
1
0
03 Apr 2024
Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Beibei Zhang
Hongwei Zhu
Feng Gao
Zhihui Yang
Xiaoyang Sean Wang
29
1
0
07 Dec 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
26
1
0
31 Jul 2023
DreamShard: Generalizable Embedding Table Placement for Recommender Systems
Daochen Zha
Louis Feng
Qiaoyu Tan
Zirui Liu
Kwei-Herng Lai
Bhargav Bhushanam
Yuandong Tian
A. Kejariwal
Xia Hu
LMTD
OffRL
20
28
0
05 Oct 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
66
76
0
22 Sep 2022
PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
Xiang Yang
Zikang Xu
Q. Qi
Jingyu Wang
Haifeng Sun
J. Liao
Song Guo
16
11
0
17 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
25
90
0
02 Jun 2022
FuncPipe: A Pipelined Serverless Framework for Fast and Cost-efficient Training of Deep Learning Models
Yunzhuo Liu
Bo Jiang
Tian Guo
Zimeng Huang
Wen-ping Ma
Xinbing Wang
Chenghu Zhou
19
9
0
28 Apr 2022
Towards Efficient Post-training Quantization of Pre-trained Language Models
Haoli Bai
Lu Hou
Lifeng Shang
Xin Jiang
Irwin King
M. Lyu
MQ
76
47
0
30 Sep 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
28
127
0
10 May 2021
PanGu-
α
α
α
: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
27
212
0
26 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
11
645
0
09 Apr 2021
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1