ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.03079
  4. Cited By
Clipper: A Low-Latency Online Prediction Serving System

Clipper: A Low-Latency Online Prediction Serving System

9 December 2016
D. Crankshaw
Xin Wang
Giulio Zhou
Michael Franklin
Joseph E. Gonzalez
Ion Stoica
ArXivPDFHTML

Papers citing "Clipper: A Low-Latency Online Prediction Serving System"

50 / 76 papers shown
Title
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor
Seungbeom Choi
Jeonghoe Goo
Eunjoo Jeon
Mingyu Yang
Minsung Jang
21
0
0
14 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yong Li
Haojie Wang
Biao Hou
Jidong Zhai
45
0
0
12 May 2025
Patchwork: A Unified Framework for RAG Serving
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
28
0
0
01 May 2025
LithOS: An Operating System for Efficient Machine Learning on GPUs
LithOS: An Operating System for Efficient Machine Learning on GPUs
Patrick H. Coppock
Brian Zhang
Eliot H. Solomon
Vasilis Kypriotis
Leon Yang
Bikash Sharma
Dan Schatzberg
Todd C. Mowry
Dimitrios Skarlatos
40
0
0
21 Apr 2025
Cyber for AI at SemEval-2025 Task 4: Forgotten but Not Lost: The Balancing Act of Selective Unlearning in Large Language Models
Dinesh Srivasthav P
Bala Mallikarjunarao Garlapati
MU
47
0
0
02 Mar 2025
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
256
0
0
08 Jan 2025
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Ferdi Kossmann
Bruce Fontaine
Daya Khudia
Michael Cafarella
Samuel Madden
146
2
0
23 Oct 2024
Erasure Coded Neural Network Inference via Fisher Averaging
Erasure Coded Neural Network Inference via Fisher Averaging
Divyansh Jhunjhunwala
Neharika Jali
Gauri Joshi
Shiqiang Wang
MoMe
FedML
31
1
0
02 Sep 2024
Loki: A System for Serving ML Inference Pipelines with Hardware and
  Accuracy Scaling
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
Sohaib Ahmad
Hui Guan
Ramesh K. Sitaraman
42
4
0
04 Jul 2024
Teola: Towards End-to-End Optimization of LLM-based Applications
Teola: Towards End-to-End Optimization of LLM-based Applications
Xin Tan
Yimin Jiang
Yitao Yang
Hong-Yu Xu
73
5
0
29 Jun 2024
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet
  Module Accelerators
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Mohanad Odema
Luke Chen
Hyoukjun Kwon
Mohammad Abdullah Al Faruque
41
4
0
01 May 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
39
6
0
09 Apr 2024
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical
  Scaling
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling
Kamran Razavi
Saeid Ghafouri
Max Mühlhäuser
Pooyan Jamshidi
Lin Wang
31
3
0
31 Mar 2024
Genie: Smart ROS-based Caching for Connected Autonomous Robots
Genie: Smart ROS-based Caching for Connected Autonomous Robots
Zexin Li
Soroush Bateni
Cong Liu
39
1
0
29 Feb 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
94
9
0
29 Feb 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Yuting Yang
Andrea Merlina
Weijia Song
Tiancheng Yuan
Ken Birman
Roman Vitenberg
49
0
0
27 Feb 2024
Combining Cloud and Mobile Computing for Machine Learning
Combining Cloud and Mobile Computing for Machine Learning
Ruiqi Xu
Tianchi Zhang
39
1
0
20 Jan 2024
Graft: Efficient Inference Serving for Hybrid Deep Learning with SLO
  Guarantees via DNN Re-alignment
Graft: Efficient Inference Serving for Hybrid Deep Learning with SLO Guarantees via DNN Re-alignment
Jing Wu
Lin Wang
Qirui Jin
Fangming Liu
33
11
0
17 Dec 2023
Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables
Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables
Taesik Gong
S. Jang
Utku Günay Acer
F. Kawsar
Chulhong Min
41
2
0
11 Dec 2023
Punica: Multi-Tenant LoRA Serving
Punica: Multi-Tenant LoRA Serving
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
44
34
0
28 Oct 2023
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing
  Inference Serving Systems
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Debopam Sanyal
Jui-Tse Hung
Manavi Agrawal
Prahlad Jasti
Shahab Nikkhoo
S. Jha
Tianhao Wang
Sibin Mohan
Alexey Tumanov
51
0
0
03 Jul 2023
S$^{3}$: Increasing GPU Utilization during Generative Inference for
  Higher Throughput
S3^{3}3: Increasing GPU Utilization during Generative Inference for Higher Throughput
Yunho Jin
Chun-Feng Wu
David Brooks
Gu-Yeon Wei
39
62
0
09 Jun 2023
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model
  Swapping
FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping
Minchen Yu
Ao Wang
Dong-dong Chen
Haoxuan Yu
Xiaonan Luo
Zhuohao Li
Wei Wang
Ruichuan Chen
Dapeng Nie
Haoran Yang
23
12
0
06 Jun 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning
  Inference Service
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
28
27
0
19 Apr 2023
MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera
  Configurations
MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations
M. Wong
M. Ramanujam
Guha Balakrishnan
Ravi Netravali
34
4
0
04 Apr 2023
MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep
  Learning Clusters
MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters
Yihao Zhao
Xin Liu
Shufan Liu
Xiang Li
Yibo Zhu
Gang Huang
Xuanzhe Liu
Xin Jin
35
11
0
24 Mar 2023
Scheduling Inference Workloads on Distributed Edge Clusters with
  Reinforcement Learning
Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning
Gabriele Castellano
J. Nieto
Jordi Luque
Ferran Diego
Carlos Segura
Diego Perino
Flavio Esposito
Fulvio Risso
Aravindh Raman
19
0
0
31 Jan 2023
Improving Inference Performance of Machine Learning with the
  Divide-and-Conquer Principle
Improving Inference Performance of Machine Learning with the Divide-and-Conquer Principle
Alex Kogan
LRM
24
0
0
12 Jan 2023
Kernel-as-a-Service: A Serverless Interface to GPUs
Kernel-as-a-Service: A Serverless Interface to GPUs
Nathan Pemberton
Anton Zabreyko
Zhoujie Ding
R. Katz
Joseph E. Gonzalez
29
8
0
15 Dec 2022
iGniter: Interference-Aware GPU Resource Provisioning for Predictable
  DNN Inference in the Cloud
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the Cloud
Fei Xu
Jianian Xu
Jiabin Chen
Li Chen
Ruitao Shang
Zhi Zhou
Fengyuan Liu
GNN
41
35
0
03 Nov 2022
Management of Machine Learning Lifecycle Artifacts: A Survey
Management of Machine Learning Lifecycle Artifacts: A Survey
Marius Schlegel
K. Sattler
25
35
0
21 Oct 2022
Merlin HugeCTR: GPU-accelerated Recommender System Training and
  Inference
Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference
Zehuan Wang
Yingcan Wei
Minseok Lee
Matthias Langer
F. Yu
...
Daniel G. Abel
Xu Guo
Jianbing Dong
Ji Shi
Kunlun Li
GNN
LRM
25
32
0
17 Oct 2022
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with
  Heterogeneous Cloud Resources
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
30
11
0
12 Oct 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
  Networks on Edge NPUs
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
42
8
0
27 Sep 2022
Improving the Performance of DNN-based Software Services using Automated
  Layer Caching
Improving the Performance of DNN-based Software Services using Automated Layer Caching
M. Abedi
Yanni Iouannou
Pooyan Jamshidi
Hadi Hemmati
28
0
0
18 Sep 2022
Operationalizing Machine Learning: An Interview Study
Operationalizing Machine Learning: An Interview Study
Shreya Shankar
Rolando Garcia
J. M. Hellerstein
Aditya G. Parameswaran
71
51
0
16 Sep 2022
An efficient and flexible inference system for serving heterogeneous
  ensembles of deep neural networks
An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks
Pierrick Pochelu
S. Petiton
B. Conche
14
2
0
30 Aug 2022
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using
  a Diverse Pool of Cloud Computing Instances
RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li
Rohan Basu Roy
Tirthak Patel
V. Gadepally
K. Gettings
Devesh Tiwari
37
25
0
23 Jul 2022
On Efficient Approximate Queries over Machine Learning Models
On Efficient Approximate Queries over Machine Learning Models
Dujian Ding
S. Amer-Yahia
L. Lakshmanan
27
5
0
06 Jun 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous
  Infrastructures
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
34
22
0
10 May 2022
Pathways: Asynchronous Distributed Dataflow for ML
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
45
126
0
23 Mar 2022
GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at
  the Edge
GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Arthi Padmanabhan
Neil Agarwal
Anand Iyer
Ganesh Ananthanarayanan
Yuanchao Shu
Nikolaos Karianakis
G. Xu
Ravi Netravali
43
59
0
19 Jan 2022
Accelerating Deep Learning Classification with Error-controlled
  Approximate-key Caching
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching
A. Finamore
James W. Roberts
Massimo Gallo
Dario Rossi
19
10
0
13 Dec 2021
Serving DNN Models with Multi-Instance GPUs: A Case of the
  Reconfigurable Machine Scheduling Problem
Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Cheng Tan
Zhichao Li
Jian Zhang
Yunyin Cao
Sikai Qi
Zherui Liu
Yibo Zhu
Chuanxiong Guo
31
34
0
18 Sep 2021
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge
  Devices
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices
Chulhong Min
Akhil Mathur
Utku Günay Acer
A. Montanari
F. Kawsar
30
11
0
08 Sep 2021
Multi-model Machine Learning Inference Serving with GPU Spatial
  Partitioning
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
30
21
0
01 Sep 2021
Computation and Communication Co-Design for Real-Time Monitoring and
  Control in Multi-Agent Systems
Computation and Communication Co-Design for Real-Time Monitoring and Control in Multi-Agent Systems
Vishrant Tripathi
Luca Ballotta
Luca Carlone
E. Modiano
24
10
0
06 Aug 2021
Concept for a Technical Infrastructure for Management of Predictive
  Models in Industrial Applications
Concept for a Technical Infrastructure for Management of Predictive Models in Industrial Applications
F. Bachinger
G. Kronberger
22
5
0
29 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
56
95
0
01 Jul 2021
ModelPS: An Interactive and Collaborative Platform for Editing
  Pre-trained Models at Scale
ModelPS: An Interactive and Collaborative Platform for Editing Pre-trained Models at Scale
Yuanming Li
Huaizheng Zhang
Shanshan Jiang
Fan Yang
Yonggang Wen
Yong Luo
21
2
0
18 May 2021
12
Next