ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.00041
  4. Cited By
Dynamic Space-Time Scheduling for GPU Inference

Dynamic Space-Time Scheduling for GPU Inference

31 December 2018
Paras Jain
Xiangxi Mo
Ajay Jain
Harikaran Subbaraj
Rehana Durrani
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
ArXiv (abs)PDFHTML

Papers citing "Dynamic Space-Time Scheduling for GPU Inference"

20 / 20 papers shown
A Study of Skews, Imbalances, and Pathological Conditions in LLM Inference Deployment on GPU Clusters detectable from DPU
A Study of Skews, Imbalances, and Pathological Conditions in LLM Inference Deployment on GPU Clusters detectable from DPU
Javed I. Khan an Henry Uwabor Moye
88
0
0
09 Sep 2025
CascadeServe: Unlocking Model Cascades for Inference Serving
CascadeServe: Unlocking Model Cascades for Inference Serving
Ferdi Kossmann
Ziniu Wu
Alex Turk
Nesime Tatbul
Lei Cao
Samuel Madden
229
8
0
20 Jun 2024
Hydro: Adaptive Query Processing of ML Queries
Hydro: Adaptive Query Processing of ML Queries
Gaurav Tarlok Kakkar
Jiashen Cao
Aubhro Sengupta
Joy Arulraj
Hyesoon Kim
164
3
0
22 Mar 2024
A Survey of Serverless Machine Learning Model Inference
A Survey of Serverless Machine Learning Model Inference
Kamil Kojs
172
8
0
22 Nov 2023
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Seyed Morteza Nabavinejad
M. Ebrahimi
Sherief Reda
175
1
0
26 Aug 2023
Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on
  Edge GPU
Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPUACM International Conference on Embedded Networked Sensor Systems (SenSys), 2023
Zhihe Zhao
Neiwen Ling
Nan Guan
Guoliang Xing
131
18
0
10 Jul 2023
D-STACK: High Throughput DNN Inference by Effective Multiplexing and
  Spatio-Temporal Scheduling of GPUs
D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUsIEEE Transactions on Cloud Computing (IEEE TCC), 2023
Aditya Dhakal
Sameer G. Kulkarni
K. Ramakrishnan
109
5
0
31 Mar 2023
A Study on the Intersection of GPU Utilization and CNN Inference
A Study on the Intersection of GPU Utilization and CNN Inference
J. Kosaian
Amar Phanishayee
164
7
0
15 Dec 2022
iGniter: Interference-Aware GPU Resource Provisioning for Predictable
  DNN Inference in the Cloud
iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the CloudIEEE Transactions on Parallel and Distributed Systems (TPDS), 2022
Fei Xu
Jianian Xu
Jiabin Chen
Li Chen
Ruitao Shang
Zhi Zhou
Fengyuan Liu
GNN
156
49
0
03 Nov 2022
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy,
  Challenges and Vision
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
Wei Gao
Qi Hu
Zhisheng Ye
Yang Liu
Xiaolin Wang
Yingwei Luo
Tianwei Zhang
Yonggang Wen
293
36
0
24 May 2022
Batched matrix operations on distributed GPUs with application in
  theoretical physics
Batched matrix operations on distributed GPUs with application in theoretical physicsInternational Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2022
Nenad Mijić
Davor Davidović
65
3
0
17 Mar 2022
Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep
  Learning Workloads
Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads
Guin Gilman
R. Walls
GNNBDL
169
19
0
01 Oct 2021
Multi-model Machine Learning Inference Serving with GPU Spatial
  Partitioning
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
106
27
0
01 Sep 2021
Boggart: Towards General-Purpose Acceleration of Retrospective Video
  Analytics
Boggart: Towards General-Purpose Acceleration of Retrospective Video AnalyticsSymposium on Networked Systems Design and Implementation (NSDI), 2021
Neil Agarwal
Ravi Netravali
172
19
0
21 Jun 2021
Contention-Aware GPU Partitioning and Task-to-Partition Allocation for
  Real-Time Workloads
Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time WorkloadsInternational Conference on Real-Time and Network Systems (RTNS), 2021
Houssam-Eddine Zahaf
Ignacio Sañudo Olmedo
Jayati Singh
Nicola Capodieci
Sébastien Faucou
50
12
0
21 May 2021
Accelerating Multi-Model Inference by Merging DNNs of Different Weights
Accelerating Multi-Model Inference by Merging DNNs of Different Weights
Joo Seong Jeong
Soojeong Kim
Gyeong-In Yu
Yunseong Lee
Byung-Gon Chun
FedMLMoMeAI4CE
109
8
0
28 Sep 2020
Spatial Sharing of GPU for Autotuning DNN models
Spatial Sharing of GPU for Autotuning DNN models
Aditya Dhakal
Junguk Cho
Sameer G. Kulkarni
K. Ramakrishnan
P. Sharma
207
9
0
08 Aug 2020
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for
  CNN Models
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models
Matthew LeMay
Shijian Li
Tian Guo
121
30
0
05 Dec 2019
INFaaS: A Model-less and Managed Inference Serving System
INFaaS: A Model-less and Managed Inference Serving System
Francisco Romero
Qian Li
N. Yadwadkar
Christos Kozyrakis
277
15
0
30 May 2019
The OoO VLIW JIT Compiler for GPU Inference
The OoO VLIW JIT Compiler for GPU Inference
Paras Jain
Xiangxi Mo
Ajay Jain
Alexey Tumanov
Joseph E. Gonzalez
Ion Stoica
167
18
0
28 Jan 2019
1