Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.04548
Cited By
PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
6 September 2019
Yujeong Choi
Minsoo Rhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units"
12 / 12 papers shown
Title
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
94
9
0
29 Feb 2024
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim
Hasan Genç
Vadim Nikiforov
Krste Asanović
B. Nikolić
Y. Shao
27
20
0
10 May 2023
Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations
Yujeong Choi
John Kim
Minsoo Rhu
21
1
0
23 Feb 2023
Multi-DNN Accelerators for Next-Generation AI Systems
Stylianos I. Venieris
C. Bouganis
Nicholas D. Lane
40
7
0
19 May 2022
A Mixed Quantization Network for Computationally Efficient Mobile Inverse Tone Mapping
Juan Borrego-Carazo
Mete Ozay
Frederik Laboyrie
Paul Wisbey
MQ
26
0
0
12 Mar 2022
EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators
Lois Orosa
Skanda Koppula
Yaman Umuroglu
Konstantinos Kanellopoulos
Juan Gómez Luna
Michaela Blott
K. Vissers
O. Mutlu
48
4
0
04 Feb 2022
Bandwidth Utilization Side-Channel on ML Inference Accelerators
Sarbartha Banerjee
Shijia Wei
Prakash Ramrakhyani
Mohit Tiwari
31
3
0
14 Oct 2021
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
30
21
0
01 Sep 2021
Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Amin Banitalebi-Dehkordi
Naveen Vedula
J. Pei
Fei Xia
Lanjun Wang
Yong Zhang
22
89
0
30 Aug 2021
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Yujeong Choi
Yunseong Kim
Minsoo Rhu
24
66
0
25 Oct 2020
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
27
40
0
25 Oct 2020
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference
Udit Gupta
Samuel Hsia
V. Saraph
Xiaodong Wang
Brandon Reagen
Gu-Yeon Wei
Hsien-Hsin S. Lee
David Brooks
Carole-Jean Wu
GNN
38
188
0
08 Jan 2020
1