Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.09886
Cited By
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
24 November 2018
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
D. Khudia
James Law
Parth Malani
Andrey Malevich
N. Satish
J. Pino
Martin D. Schatz
Alexander Sidorov
V. Sivakumar
Andrew Tulloch
Xiaodong Wang
Yiming Wu
Hector Yuen
Utku Diril
Dmytro Dzhulgakov
K. Hazelwood
Bill Jia
Yangqing Jia
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
FedML
GNN
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications"
21 / 21 papers shown
Title
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
32
1
0
11 Jun 2024
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design
Shinkook Choi
Junkyeong Choi
14
1
0
08 Apr 2023
AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
Chendi Li
Haipeng Jia
Hang Cao
Jianyu Yao
Boqian Shi
Chunyang Xiang
Jinbo Sun
Pengqi Lu
Yunquan Zhang
6
7
0
17 Aug 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
27
25
0
23 Jul 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
9
6
0
12 May 2022
Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Benjamin Ghaemmaghami
Mustafa Ozdal
Rakesh Komuravelli
D. Korchev
Dheevatsa Mudigere
Krishnakumar Nair
Maxim Naumov
23
6
0
28 Mar 2022
Memory Planning for Deep Neural Networks
Maksim Levental
23
4
0
23 Feb 2022
Supporting Massive DLRM Inference Through Software Defined Memory
E. K. Ardestani
Changkyu Kim
Seung Jae Lee
Luoshang Pan
Valmiki Rampersad
...
Krishnakumar Nair
Maxim Naumov
Christopher Peterson
M. Smelyanskiy
Vijay Rao
BDL
31
20
0
21 Oct 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández Orallo
35
113
0
12 Sep 2021
JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu
Hao Liu
Qian Gao
Jiang Li
X. Liao
Hao Xiong
...
Guobao Yang
Zhiwei Zha
Daxiang Dong
Dejing Dou
Haoyi Xiong
VLM
22
22
0
03 Jun 2021
Demonstrating Analog Inference on the BrainScaleS-2 Mobile System
Yannik Stradmann
Sebastian Billaudelle
O. Breitwieser
F. Ebert
Arne Emmel
...
Joscha Ilmberger
Eric Müller
Philipp Spilger
Johannes Weis
Johannes Schemmel
20
12
0
29 Mar 2021
Mixed-Precision Embedding Using a Cache
J. Yang
Jianyu Huang
Jongsoo Park
P. T. P. Tang
Andrew Tulloch
16
36
0
21 Oct 2020
Time-based Sequence Model for Personalization and Recommendation Systems
T. Ishkhanov
Maxim Naumov
Xianjie Chen
Yan Zhu
Yuan Zhong
A. Azzolini
Chonglin Sun
Frank Jiang
Andrey Malevich
Liang Xiong
19
16
0
27 Aug 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka
Estelle Aflalo
Mattias Marder
Avrech Ben-David
Santiago Miret
Shie Mannor
Tamir Hazan
Hanlin Tang
Somdeb Majumdar
GNN
21
11
0
14 Jul 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
46
81
0
02 Jul 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
S. Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
48
47
0
10 May 2020
Post-Training 4-bit Quantization on Embedding Tables
Hui Guan
Andrey Malevich
Jiyan Yang
Jongsoo Park
Hector Yuen
MQ
11
31
0
05 Nov 2019
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Mengdi Wang
Chen Meng
Guoping Long
Chuan Wu
Jun Yang
Wei Lin
Yangqing Jia
17
53
0
14 Oct 2019
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
Udit Gupta
Carole-Jean Wu
Xiaodong Wang
Maxim Naumov
Brandon Reagen
...
Andrey Malevich
Dheevatsa Mudigere
M. Smelyanskiy
Liang Xiong
Xuan Zhang
GNN
30
290
0
06 Jun 2019
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Eldad Meller
Alexander Finkelstein
Uri Almog
Mark Grobman
MQ
13
85
0
05 Feb 2019
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
28
17
0
28 Jan 2019
1