ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.09886
  4. Cited By
Deep Learning Inference in Facebook Data Centers: Characterization,
  Performance Optimizations and Hardware Implications

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

24 November 2018
Jongsoo Park
Maxim Naumov
Protonu Basu
Summer Deng
Aravind Kalaiah
D. Khudia
James Law
Parth Malani
Andrey Malevich
N. Satish
J. Pino
Martin D. Schatz
Alexander Sidorov
V. Sivakumar
Andrew Tulloch
Xiaodong Wang
Yiming Wu
Hector Yuen
Utku Diril
Dmytro Dzhulgakov
K. Hazelwood
Bill Jia
Yangqing Jia
Lin Qiao
Vijay Rao
Nadav Rotem
S. Yoo
M. Smelyanskiy
    FedML
    GNN
    BDL
ArXivPDFHTML

Papers citing "Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications"

21 / 21 papers shown
Title
ElasticRec: A Microservice-based Model Serving Architecture Enabling
  Elastic Resource Scaling for Recommendation Models
ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models
Yujeong Choi
Jiin Kim
Minsoo Rhu
32
1
0
11 Jun 2024
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient
  Block Design
Arithmetic Intensity Balancing Convolution for Hardware-aware Efficient Block Design
Shinkook Choi
Junkyeong Choi
14
1
0
08 Apr 2023
AutoTSMM: An Auto-tuning Framework for Building High-Performance
  Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs
Chendi Li
Haipeng Jia
Hang Cao
Jianyu Yao
Boqian Shi
Chunyang Xiang
Jinbo Sun
Pengqi Lu
Yunquan Zhang
6
7
0
17 Aug 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
  a Diverse Pool of Cloud Computing Instances
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
27
25
0
23 Jul 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
9
6
0
12 May 2022
Learning to Collide: Recommendation System Model Compression with
  Learned Hash Functions
Learning to Collide: Recommendation System Model Compression with Learned Hash Functions
Benjamin Ghaemmaghami
Mustafa Ozdal
Rakesh Komuravelli
D. Korchev
Dheevatsa Mudigere
Krishnakumar Nair
Maxim Naumov
23
6
0
28 Mar 2022
Memory Planning for Deep Neural Networks
Memory Planning for Deep Neural Networks
Maksim Levental
23
4
0
23 Feb 2022
Supporting Massive DLRM Inference Through Software Defined Memory
Supporting Massive DLRM Inference Through Software Defined Memory
E. K. Ardestani
Changkyu Kim
Seung Jae Lee
Luoshang Pan
Valmiki Rampersad
...
Krishnakumar Nair
Maxim Naumov
Christopher Peterson
M. Smelyanskiy
Vijay Rao
BDL
31
20
0
21 Oct 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández Orallo
35
113
0
12 Sep 2021
JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale
  Online Inference at Baidu
JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu
Hao Liu
Qian Gao
Jiang Li
X. Liao
Hao Xiong
...
Guobao Yang
Zhiwei Zha
Daxiang Dong
Dejing Dou
Haoyi Xiong
VLM
22
22
0
03 Jun 2021
Demonstrating Analog Inference on the BrainScaleS-2 Mobile System
Demonstrating Analog Inference on the BrainScaleS-2 Mobile System
Yannik Stradmann
Sebastian Billaudelle
O. Breitwieser
F. Ebert
Arne Emmel
...
Joscha Ilmberger
Eric Müller
Philipp Spilger
Johannes Weis
Johannes Schemmel
20
12
0
29 Mar 2021
Mixed-Precision Embedding Using a Cache
Mixed-Precision Embedding Using a Cache
J. Yang
Jianyu Huang
Jongsoo Park
P. T. P. Tang
Andrew Tulloch
16
36
0
21 Oct 2020
Time-based Sequence Model for Personalization and Recommendation Systems
Time-based Sequence Model for Personalization and Recommendation Systems
T. Ishkhanov
Maxim Naumov
Xianjie Chen
Yan Zhu
Yuan Zhong
A. Azzolini
Chonglin Sun
Frank Jiang
Andrey Malevich
Liang Xiong
19
16
0
27 Aug 2020
Optimizing Memory Placement using Evolutionary Graph Reinforcement
  Learning
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
Shauharda Khadka
Estelle Aflalo
Mattias Marder
Avrech Ben-David
Santiago Miret
Shie Mannor
Tamir Hazan
Hanlin Tang
Somdeb Majumdar
GNN
21
11
0
14 Jul 2020
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML
  Models: A Survey and Insights
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights
Shail Dave
Riyadh Baghdadi
Tony Nowatzki
Sasikanth Avancha
Aviral Shrivastava
Baoxin Li
46
81
0
02 Jul 2020
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster
  Architectures
Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Dhiraj D. Kalamkar
E. Georganas
S. Srinivasan
Jianping Chen
Mikhail Shiryaev
A. Heinecke
48
47
0
10 May 2020
Post-Training 4-bit Quantization on Embedding Tables
Post-Training 4-bit Quantization on Embedding Tables
Hui Guan
Andrey Malevich
Jiyan Yang
Jongsoo Park
Hector Yuen
MQ
11
31
0
05 Nov 2019
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Mengdi Wang
Chen Meng
Guoping Long
Chuan Wu
Jun Yang
Wei Lin
Yangqing Jia
17
53
0
14 Oct 2019
The Architectural Implications of Facebook's DNN-based Personalized
  Recommendation
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
Udit Gupta
Carole-Jean Wu
Xiaodong Wang
Maxim Naumov
Brandon Reagen
...
Andrey Malevich
Dheevatsa Mudigere
M. Smelyanskiy
Liang Xiong
Xuan Zhang
GNN
30
290
0
06 Jun 2019
Same, Same But Different - Recovering Neural Network Quantization Error
  Through Weight Factorization
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Eldad Meller
Alexander Finkelstein
Uri Almog
Mark Grobman
MQ
13
85
0
05 Feb 2019
The OoO VLIW JIT Compiler for GPU Inference
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
28
17
0
28 Jan 2019
1