Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.04680
Cited By
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
10 May 2020
Dhiraj D. Kalamkar
E. Georganas
S. Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures"
21 / 21 papers shown
Title
PIFS-Rec: Process-In-Fabric-Switch for Large-Scale Recommendation System Inferences
Pingyi Huo
Anusha Devulapally
Hasan Al Maruf
Minseo Park
Krishnakumar Nair
Meena Arunachalam
Gulsum Gudukbay Akbulut
M. Kandemir
Vijaykrishnan Narayanan
24
0
0
25 Sep 2024
Efficient Tabular Data Preprocessing of ML Pipelines
Yu Zhu
Wenqi Jiang
Gustavo Alonso
LMTD
18
1
0
23 Sep 2024
Accelerating Recommender Model Training by Dynamically Skipping Stale Embeddings
Yassaman Ebrahimzadeh Maboud
Muhammad Adnan
Divyat Mahajan
Prashant J. Nair
AI4TS
30
0
0
22 Mar 2024
[Experiments & Analysis] Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons
Sana Ebrahimi
Rishi Advani
Abolfazl Asudeh
14
0
0
15 Jun 2023
Mem-Rec: Memory Efficient Recommendation System using Alternative Representation
Gopu Krishna Jha
Anthony Thomas
Nilesh Jain
Sameh Gobriel
Tajana Rosing
Ravi Iyer
40
2
0
12 May 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi
John Kim
Minsoo Rhu
13
1
0
23 Feb 2023
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
17
11
0
12 Oct 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
22
25
0
23 Jul 2022
The trade-offs of model size in large recommendation models : A 10000
×
\times
×
compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)
Aditya Desai
Anshumali Shrivastava
AI4CE
15
3
0
21 Jul 2022
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure
Y. F. Arthanto
David Ojika
Joo-Young Kim
FedML
56
2
0
11 Jul 2022
Deep Learning Models on CPUs: A Methodology for Efficient Training
Quchen Fu
Ramesh Chukka
Keith Achorn
Thomas Atta-fosu
Deepak R. Canchi
Zhongwei Teng
Jules White
Douglas C. Schmidt
16
1
0
20 Jun 2022
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Rui Ma
E. Georganas
A. Heinecke
Andrew Boutros
Eriko Nurvitadhi
GNN
22
12
0
22 Apr 2022
Heterogeneous Acceleration Pipeline for Recommendation System Training
Muhammad Adnan
Yassaman Ebrahimzadeh Maboud
Divyat Mahajan
Prashant J. Nair
21
18
0
11 Apr 2022
Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
Zhongyi Lin
Louis Feng
E. K. Ardestani
Jaewon Lee
J. Lundell
Changkyu Kim
A. Kejariwal
John Douglas Owens
22
19
0
19 Jan 2022
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
26
17
0
12 Apr 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
19
149
0
12 Apr 2021
ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding
Kaige Liu
J. Kosaian
K. V. Rashmi
12
4
0
05 Apr 2021
CPR: Understanding and Improving Failure Tolerant Training for Deep Learning Recommendation with Partial Recovery
Kiwan Maeng
Shivam Bharuka
Isabel Gao
M. C. Jeffrey
V. Saraph
...
Caroline Trippel
Jiyan Yang
Michael G. Rabbat
Brandon Lucia
Carole-Jean Wu
OffRL
11
31
0
05 Nov 2020
Cross-Stack Workload Characterization of Deep Recommendation Systems
Samuel Hsia
Udit Gupta
Mark Wilkening
Carole-Jean Wu
Gu-Yeon Wei
David Brooks
BDL
GNN
HAI
18
31
0
10 Oct 2020
Accelerating Recommender Systems via Hardware "scale-in"
S. Krishna
Ravi Krishna
GNN
LRM
14
6
0
11 Sep 2020
Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems
Maxim Naumov
John Kim
Dheevatsa Mudigere
Srinivas Sridharan
Xiaodong Wang
...
Krishnakumar Nair
Isabel Gao
Bor-Yiing Su
Jiyan Yang
M. Smelyanskiy
GNN
38
83
0
20 Mar 2020
1