ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.05158
  4. Cited By
Software-Hardware Co-design for Fast and Scalable Training of Deep
  Learning Recommendation Models

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

12 April 2021
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
Srinivas Sridharan
Xing Liu
Mustafa Ozdal
Jade Nie
Jongsoo Park
Liangchen Luo
J. Yang
Leon Gao
Dmytro Ivchenko
Aarti Basant
Yuxi Hu
Jiyan Yang
E. K. Ardestani
Xiaodong Wang
Rakesh Komuravelli
Ching-Hsiang Chu
Serhat Yilmaz
Huayu Li
Jiyuan Qian
Zhuobo Feng
Yi-An Ma
Junjie Yang
Ellie Wen
Hong Yu Li
Lin Yang
Chonglin Sun
Whitney Zhao
Dimitry Melts
Krishnaveni Dhulipala
Kranthi G. Kishore
Tyler N. Graf
Assaf Eisenman
Kiran Kumar Matam
Adi Gangidi
Guoqiang Jerry Chen
M. Krishnan
A. Nayak
Krishnakumar Nair
Bharath Muthiah
Mahmoud khorashadi
P. Bhattacharya
Petr Lapukhov
Maxim Naumov
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
ArXivPDFHTML

Papers citing "Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models"

20 / 20 papers shown
Title
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Z. Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
122
0
0
02 May 2025
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
An Efficient Large Recommendation Model: Towards a Resource-Optimal Scaling Law
Songpei Xu
Shijia Wang
Da Guo
Xianwen Guo
Qiang Xiao
Fangjian Li
Chuanjiang Luo
78
0
0
17 Feb 2025
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems
Hung Vinh Tran
Tong Chen
Quoc Viet Hung Nguyen
Zi-Rui Huang
Lizhen Cui
Hongzhi Yin
41
1
0
25 Jun 2024
ElasticRec: A Microservice-based Model Serving Architecture Enabling
  Elastic Resource Scaling for Recommendation Models
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
32
1
0
11 Jun 2024
PID-Comm: A Fast and Flexible Collective Communication Framework for
  Commodity Processing-in-DIMM Devices
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
Si Ung Noh
Junguk Hong
Chaemin Lim
Seong-Yeol Park
Jeehyun Kim
Hanjun Kim
Youngsok Kim
Jinho Lee
34
6
0
13 Apr 2024
Efficient All-to-All Collective Communication Schedules for
  Direct-Connect Topologies
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
P. Basu
Liangyu Zhao
Jason Fantl
Siddharth Pal
Arvind Krishnamurthy
J. Khoury
25
7
0
24 Sep 2023
MTrainS: Improving DLRM training efficiency using heterogeneous memories
MTrainS: Improving DLRM training efficiency using heterogeneous memories
H. Kassa
Paul Johnson
Jason B. Akers
Mrinmoy Ghosh
Andrew Tulloch
Dheevatsa Mudigere
Jongsoo Park
Xing Liu
R. Dreslinski
E. K. Ardestani
20
1
0
19 Apr 2023
RAMP: A Flat Nanosecond Optical Network and MPI Operations for
  Distributed Deep Learning Systems
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
22
7
0
28 Nov 2022
RecD: Deduplication for End-to-End Deep Learning Recommendation Model
  Training Infrastructure
RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure
Mark Zhao
Dhruv Choudhary
Devashish Tyagi
A. Somani
Max Kaplan
...
Jongsoo Park
Aarti Basant
Niket Agarwal
Carole-Jean Wu
Christos Kozyrakis
VLM
18
6
0
09 Nov 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
21
20
0
03 Sep 2022
A Frequency-aware Software Cache for Large Recommendation System
  Embeddings
A Frequency-aware Software Cache for Large Recommendation System Embeddings
Jiarui Fang
Geng Zhang
Jiatong Han
Shenggui Li
Zhengda Bian
Yongbin Li
Jin Liu
Yang You
14
3
0
08 Aug 2022
Impact of RoCE Congestion Control Policies on Distributed Training of
  DNNs
Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Tarannum Khan
Saeed Rashidi
Srinivas Sridharan
Pallavi Shurpali
Aditya Akella
T. Krishna
OOD
15
11
0
22 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
22
56
0
05 Jul 2022
Training Personalized Recommendation Systems from (GPU) Scratch: Look
  Forward not Backwards
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
Youngeun Kwon
Minsoo Rhu
16
27
0
10 May 2022
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10
  minutes on 1 GPU
CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU
Zangwei Zheng
Peng Xu
Xuan Zou
Da Tang
Zhen Li
...
Xiangzhuo Ding
Fuzhao Xue
Ziheng Qing
Youlong Cheng
Yang You
VLM
37
7
0
13 Apr 2022
BagPipe: Accelerating Deep Recommendation Model Training
BagPipe: Accelerating Deep Recommendation Model Training
Saurabh Agarwal
Chengpo Yan
Ziyi Zhang
Shivaram Venkataraman
21
17
0
24 Feb 2022
Compute Trends Across Three Eras of Machine Learning
Compute Trends Across Three Eras of Machine Learning
J. Sevilla
Lennart Heim
A. Ho
T. Besiroglu
Marius Hobbhahn
Pablo Villalobos
20
269
0
11 Feb 2022
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
  Distributed Training of DL Models
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Saeed Rashidi
William Won
S. Srinivasan
Srinivas Sridharan
T. Krishna
GNN
17
29
0
09 Oct 2021
Understanding Data Storage and Ingestion for Large-Scale Deep
  Recommendation Model Training
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Mark Zhao
Niket Agarwal
Aarti Basant
B. Gedik
Satadru Pan
...
Kevin Wilfong
Harsha Rastogi
Carole-Jean Wu
Christos Kozyrakis
Parikshit Pol
GNN
15
70
0
20 Aug 2021
Deep Learning Training in Facebook Data Centers: Design of Scale-up and
  Scale-out Systems
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
Maxim Naumov
John Kim
Dheevatsa Mudigere
Srinivas Sridharan
Xiaodong Wang
...
Krishnakumar Nair
Isabel Gao
Bor-Yiing Su
Jiyan Yang
M. Smelyanskiy
GNN
41
83
0
20 Mar 2020
1