Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.07470
Cited By
Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
14 August 2023
Lequn Chen
Weixin Deng
Anirudh Canumalla
Yu Xin
Danyang Zhuo
Matthai Philipose
Arvind Krishnamurthy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling"
7 / 7 papers shown
Title
Efficiently Serving LLM Reasoning Programs with Certaindex
Yichao Fu
Junda Chen
Siqi Zhu
Zheyu Fu
Zhongdongming Dai
Aurick Qiao
Hao Zhang
LRM
57
13
0
31 Dec 2024
Approximate Caching for Efficiently Serving Diffusion Models
Shubham Agarwal
Subrata Mitra
Sarthak Chakraborty
Srikrishna Karanam
Koyel Mukherjee
S. Saini
DiffM
33
4
0
07 Dec 2023
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines
Francisco Romero
Mark Zhao
N. Yadwadkar
Christos Kozyrakis
33
101
0
03 Feb 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,572
0
17 Apr 2017
Xception: Deep Learning with Depthwise Separable Convolutions
François Chollet
MDE
BDL
PINN
206
14,376
0
07 Oct 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
312
36,381
0
25 Aug 2016
1