Dynamic Space-Time Scheduling for GPU Inference

31 December 2018

Papers citing "Dynamic Space-Time Scheduling for GPU Inference"

20 / 20 papers shown

A Study of Skews, Imbalances, and Pathological Conditions in LLM Inference Deployment on GPU Clusters detectable from DPU

Javed I. Khan an Henry Uwabor Moye

09 Sep 2025

CascadeServe: Unlocking Model Cascades for Inference Serving

Ferdi Kossmann

Samuel Madden

229

20 Jun 2024

Hydro: Adaptive Query Processing of ML Queries

164

22 Mar 2024

A Survey of Serverless Machine Learning Model Inference

Kamil Kojs

172

22 Nov 2023

Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?

Seyed Morteza Nabavinejad

M. Ebrahimi

Sherief Reda

175

26 Aug 2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPUACM International Conference on Embedded Networked Sensor Systems (SenSys), 2023

131

10 Jul 2023

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUsIEEE Transactions on Cloud Computing (IEEE TCC), 2023

Aditya Dhakal

Sameer G. Kulkarni

K. Ramakrishnan

109

31 Mar 2023

A Study on the Intersection of GPU Utilization and CNN Inference

J. Kosaian

Amar Phanishayee

164

15 Dec 2022

iGniter: Interference-Aware GPU Resource Provisioning for Predictable DNN Inference in the CloudIEEE Transactions on Parallel and Distributed Systems (TPDS), 2022

156

03 Nov 2022

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

Xiaolin Wang

Yingwei Luo

Tianwei Zhang

Yonggang Wen

293

24 May 2022

Batched matrix operations on distributed GPUs with application in theoretical physicsInternational Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2022

Nenad Mijić

Davor Davidović

17 Mar 2022

Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads

Guin Gilman

R. Walls

GNN BDL

169

01 Oct 2021

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning

106

01 Sep 2021

Boggart: Towards General-Purpose Acceleration of Retrospective Video AnalyticsSymposium on Networked Systems Design and Implementation (NSDI), 2021

Neil Agarwal

Ravi Netravali

172

21 Jun 2021

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time WorkloadsInternational Conference on Real-Time and Network Systems (RTNS), 2021

Houssam-Eddine Zahaf

Ignacio Sañudo Olmedo

Jayati Singh

Nicola Capodieci

Sébastien Faucou

21 May 2021

Accelerating Multi-Model Inference by Merging DNNs of Different Weights

109

28 Sep 2020

Spatial Sharing of GPU for Autotuning DNN models

207

08 Aug 2020

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

Matthew LeMay

Shijian Li

Tian Guo

121

05 Dec 2019

INFaaS: A Model-less and Managed Inference Serving System

277

30 May 2019

The OoO VLIW JIT Compiler for GPU Inference