Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.01500
Cited By
MLPerf Training Benchmark
2 October 2019
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
David Patterson
Hanlin Tang
Winston Wu
Peter Bailis
Victor Bittorf
David Brooks
Dehao Chen
Debojyoti Dutta
Udit Gupta
K. Hazelwood
Andrew Hock
Aaron Mueller
Atsushi Ike
Bill Jia
Daniel Kang
David Kanter
Naveen Kumar
Jeffery Liao
Guokai Ma
Deepak Narayanan
Tayo Oguntebi
Gennady Pekhimenko
Lillian Pentecost
Vijay Janapa Reddi
Taylor Robie
T. S. John
Tsuguchika Tabaru
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MLPerf Training Benchmark"
50 / 128 papers shown
Title
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
Trends in AI Supercomputers
Konstantin Pilz
James Sanders
Robi Rahman
Lennart Heim
GNN
ELM
29
0
0
22 Apr 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
63
0
0
24 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
130
1
0
10 Feb 2025
Adaptive Consensus Gradients Aggregation for Scaled Distributed Training
Yoni Choukroun
Shlomi Azoulay
P. Kisilev
29
0
0
06 Nov 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
62
3
0
15 Oct 2024
Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
Chelsea Maria John
Stepan Nassyr
Carolin Penke
A. Herten
28
0
0
19 Sep 2024
TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
J. G. Pauloski
Valérie Hayot-Sasson
Maxime Gonthier
Nathaniel Hudson
Haochen Pan
Sicheng Zhou
Ian T. Foster
Kyle Chard
28
5
0
13 Aug 2024
Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training
Zixuan Chen
Xuandong Liu
Minglin Li
Yinfan Hu
Hao Mei
Huifeng Xing
Hao Wang
Wanxin Shi
Sen Liu
Yang Xu
16
0
0
29 Jul 2024
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
29
1
0
02 Jul 2024
Fast Optimizer Benchmark
Simon Blauth
Tobias Bürger
Zacharias Häringer
Jörg Franke
Frank Hutter
36
0
0
26 Jun 2024
AI-coupled HPC Workflow Applications, Middleware and Performance
Wes Brewer
Ana Gainaru
Frédéric Suter
Feiyi Wang
M. Emani
S. Jha
30
10
0
20 Jun 2024
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
Christopher Rae
Joseph K. L. Lee
James Richings
Michele Weiland
25
1
0
16 Apr 2024
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
Yidong Gong
Pradeep Kumar
GNN
35
3
0
05 Apr 2024
Partial Rankings of Optimizers
Julian Rodemann
Hannah Blocher
28
4
0
26 Feb 2024
Benchmarking multi-component signal processing methods in the time-frequency plane
J. M. Miramont
Rémi Bardenet
P. Chainais
François Auger
11
1
0
13 Feb 2024
Breaking MLPerf Training: A Case Study on Optimizing BERT
Yongdeok Kim
Jaehyung Ahn
Myeongwoo Kim
Changin Choi
Heejae Kim
...
Xiongzhan Linghu
Jingkun Ma
Lin Chen
Yuehua Dai
Sungjoo Yoo
17
0
0
04 Feb 2024
GPU Cluster Scheduling for Network-Sensitive Deep Learning
Aakash Sharma
Vivek M. Bhasi
Sonali Singh
G. Kesidis
M. Kandemir
Chita R. Das
18
3
0
29 Jan 2024
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices
A. Menon
Unnikrishnan Menon
Kailash Ahirwar
16
1
0
03 Jan 2024
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Hanpeng Hu
Junwei Su
Juntao Zhao
Yanghua Peng
Yibo Zhu
Haibin Lin
Chuan Wu
16
1
0
16 Nov 2023
A Machine Learning-oriented Survey on Tiny Machine Learning
Luigi Capogrosso
Federico Cunico
D. Cheng
Franco Fummi
Marco Cristani
SyDa
MU
24
33
0
21 Sep 2023
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
P. Phothilimthana
Sami Abu-El-Haija
Kaidi Cao
Bahare Fatemi
Mike Burrows
Charith Mendis
Bryan Perozzi
GNN
AI4TS
25
17
0
25 Aug 2023
Towards Robust and Efficient Continual Language Learning
Adam Fisch
Amal Rannen-Triki
Razvan Pascanu
J. Bornschein
Angeliki Lazaridou
E. Gribovskaya
MarcÁurelio Ranzato
CLL
24
1
0
11 Jul 2023
FFCV: Accelerating Training by Removing Data Bottlenecks
Guillaume Leclerc
Andrew Ilyas
Logan Engstrom
Sung Min Park
Hadi Salman
A. Madry
18
67
0
21 Jun 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
25
2
0
18 Jun 2023
Evaluating the Potential of Disaggregated Memory Systems for HPC applications
Nan Ding
Pieter Maris
H. Nam
Taylor L. Groves
M. Awan
...
C. Daley
Oguz Selvitopi
L. Oliker
N. Wright
Samuel Williams
11
5
0
06 Jun 2023
How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
Alexander Isenko
R. Mayer
Hans-Arno Jacobsen
17
7
0
05 Jun 2023
Proteus: Simulating the Performance of Distributed DNN Training
Jiangfei Duan
Xiuhong Li
Ping Xu
Xingcheng Zhang
Shengen Yan
Yun Liang
Dahua Lin
72
10
0
04 Jun 2023
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim
Hasan Genç
Vadim Nikiforov
Krste Asanović
B. Nikolić
Y. Shao
19
18
0
10 May 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
25
5
0
27 Apr 2023
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems
Jason Yik
Korneel Van den Berghe
Douwe den Blanken
Younes Bouhadjar
Maxime Fabre
...
Fatima Tuz Zohora
Charlotte Frenkel
Vijay Janapa Reddi
Charlotte Frenkel
Vijay Janapa Reddi
23
17
0
10 Apr 2023
RAF: Holistic Compilation for Deep Learning Model Training
Cody Hao Yu
Haozheng Fan
Guangtai Huang
Zhen Jia
Yizhi Liu
...
Yuan Zhou
Haichen Shen
Junru Shao
Mu Li
Yida Wang
15
3
0
08 Mar 2023
Computation vs. Communication Scaling for Future Transformers on Future Hardware
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
20
9
0
06 Feb 2023
Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU
Muhammad Osama
D. Merrill
C. Cecka
M. Garland
John Douglas Owens
9
26
0
09 Jan 2023
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Mingyu Liang
Wenyin Fu
Louis Feng
Zhongyi Lin
P. Panakanti
Shengbao Zheng
Srinivas Sridharan
Christina Delimitrou
16
12
0
16 Dec 2022
SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems
Jiangsu Du
Dongsheng Li
Yingpeng Wen
Jiazhi Jiang
Dan Huang
Xia Liao
Yutong Lu
14
0
0
07 Dec 2022
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Keshav Santhanam
Jon Saad-Falcon
M. Franz
Omar Khattab
Avirup Sil
Radu Florian
Md Arafat Sultan
Salim Roukos
Matei A. Zaharia
Christopher Potts
OffRL
24
10
0
02 Dec 2022
VeLO: Training Versatile Learned Optimizers by Scaling Up
Luke Metz
James Harrison
C. Freeman
Amil Merchant
Lucas Beyer
...
Naman Agrawal
Ben Poole
Igor Mordatch
Adam Roberts
Jascha Narain Sohl-Dickstein
24
60
0
17 Nov 2022
Distributed Graph Neural Network Training: A Survey
Yingxia Shao
Hongzheng Li
Xizhi Gu
Hongbo Yin
Yawen Li
Xupeng Miao
Wentao Zhang
Bin Cui
Lei Chen
GNN
AI4CE
11
55
0
01 Nov 2022
An Overview of the Data-Loader Landscape: Comparative Performance Analysis
Iason Ofeidis
Diego Kiedanski
Leandros Tassiulas
13
7
0
27 Sep 2022
Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems
Prasoon Sinha
Akhil Guliani
Rutwik Jain
Brandon Tran
Matthew D. Sinclair
Shivaram Venkataraman
11
17
0
23 Aug 2022
Boosting Distributed Training Performance of the Unpadded BERT Model
Jinle Zeng
Min Li
Zhihua Wu
Jiaqi Liu
Yuang Liu
Dianhai Yu
Yanjun Ma
17
11
0
17 Aug 2022
DataPerf: Benchmarks for Data-Centric AI Development
Mark Mazumder
Colby R. Banbury
Xiaozhe Yao
Bojan Karlavs
W. G. Rojas
...
Carole-Jean Wu
Cody Coleman
Andrew Y. Ng
Peter Mattson
Vijay Janapa Reddi
VLM
33
101
0
20 Jul 2022
Metadata Representations for Queryable ML Model Zoos
Ziyu Li
Rihan Hai
A. Bozzon
Asterios Katsifodimos
6
2
0
19 Jul 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
58
2,020
0
27 May 2022
Preparing for the Future -- Rethinking Proxy Apps
Satoshi Matsuoka
Jens Domke
M. Wahib
Aleksandr Drozd
R. Bair
Andrew A. Chien
Jeffrey S. Vetter
J. Shalf
12
2
0
15 Apr 2022
Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models
Phyllis Ang
Bhuwan Dhingra
Lisa Wu Wills
25
6
0
15 Apr 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
37
7
0
13 Apr 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
22
87
0
01 Apr 2022
BagPipe: Accelerating Deep Recommendation Model Training
Saurabh Agarwal
Chengpo Yan
Ziyi Zhang
Shivaram Venkataraman
16
17
0
24 Feb 2022
1
2
3
Next