Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling

14 August 2023

Papers citing "Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling"

7 / 7 papers shown

Title
Efficiently Serving LLM Reasoning Programs with Certaindex Yichao Fu Junda Chen Siqi Zhu Zheyu Fu Zhongdongming Dai Aurick Qiao Hao Zhang LRM 57 13 0 31 Dec 2024
Approximate Caching for Efficiently Serving Diffusion Models Shubham Agarwal Subrata Mitra Sarthak Chakraborty Srikrishna Karanam Koyel Mukherjee S. Saini DiffM 33 4 0 07 Dec 2023
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines Francisco Romero Mark Zhao N. Yadwadkar Christos Kozyrakis 33 101 0 03 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,826 0 17 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,572 0 17 Apr 2017
Xception: Deep Learning with Depthwise Separable Convolutions François Chollet MDE BDL PINN 206 14,376 0 07 Oct 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu L. V. D. van der Maaten Kilian Q. Weinberger PINN 3DV 312 36,381 0 25 Aug 2016