Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.02784
Cited By
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
4 October 2023
Samuel Hsia
Alicia Golden
Bilge Acun
Newsha Ardalani
Zach DeVito
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems"
7 / 7 papers shown
Title
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
139
3
0
20 Nov 2024
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis
Michael Kuchnik
John Hoffman
Adithya Kumar
Parth Malani
Faye Ma
Zachary DeVito
S.
Kalyan Saladi
Carole-Jean Wu
98
7
0
29 Oct 2024
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Arya Tschand
Arun Tejusve Raghunath Rajan
S. Idgunji
Anirban Ghosh
J. Holleman
...
Rowan Taubitz
Sean Zhan
Scott Wasson
David Kanter
Vijay Janapa Reddi
62
3
0
15 Oct 2024
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
S. Srinivasan
T. Krishna
36
43
0
24 Mar 2023
RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation
Geet Sethi
Bilge Acun
Niket Agarwal
Christos Kozyrakis
Caroline Trippel
Carole-Jean Wu
39
66
0
25 Jan 2022
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
Liu Ke
Udit Gupta
Carole-Jean Wu
B. Cho
Mark Hempstead
...
Dheevatsa Mudigere
Maxim Naumov
Martin D. Schatz
M. Smelyanskiy
Xiaodong Wang
36
212
0
30 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1