Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.01302
Cited By
v1
v2 (latest)
IOS: Inter-Operator Scheduler for CNN Acceleration
Conference on Machine Learning and Systems (MLSys), 2020
2 November 2020
Yaoyao Ding
Ligeng Zhu
Zhihao Jia
Gennady Pekhimenko
Song Han
Re-assign community
ArXiv (abs)
PDF
HTML
Github (200★)
Papers citing
"IOS: Inter-Operator Scheduler for CNN Acceleration"
24 / 24 papers shown
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
Deokjae Lee
Hyun Oh Song
MQ
265
0
0
24 Sep 2025
Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers
Simin Chen
Jinjun Peng
Yixin He
Junfeng Yang
Baishakhi Ray
SILM
ELM
335
3
0
14 Sep 2025
REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving
Sujun Tang
Christopher Priebe
R. Mahapatra
Lianhui Qin
H. Esmaeilzadeh
LRM
311
1
0
02 Jun 2025
Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation
Yaoyao Ding
Bohan Hou
Xinyu Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
375
0
0
17 Apr 2025
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
IEEE Communications Surveys and Tutorials (COMST), 2023
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
345
39
0
27 Sep 2023
Automatic Task Parallelization of Dataflow Graphs in ML/DL models
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023
Srinjoy Das
Lawrence Rauchwerger
212
0
0
22 Aug 2023
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs
Zinuo Cai
Hao Wang
Tao Song
Yang Hua
Ruhui Ma
Haibing Guan
GNN
123
1
0
21 Jul 2023
DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization
International Symposium on Software Testing and Analysis (ISSTA), 2023
Simin Chen
Shiyi Wei
Cong Liu
Wei Yang
205
12
0
11 Jul 2023
Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
ACM International Conference on Embedded Networked Sensor Systems (SenSys), 2023
Zhihe Zhao
Neiwen Ling
Nan Guan
Guoliang Xing
180
18
0
10 Jul 2023
Proteus: Simulating the Performance of Distributed DNN Training
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Jiangfei Duan
Xiuhong Li
Ping Xu
Xingcheng Zhang
Shengen Yan
Yun Liang
Dahua Lin
269
14
0
04 Jun 2023
Canvas: End-to-End Kernel Architecture Search in Neural Networks
Chenggang Zhao
Genghan Zhang
Mingyu Gao
244
2
0
16 Apr 2023
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
IEEE Conference on Computer Communications (INFOCOM), 2022
Zhiying Xu
H. Peng
Wei Wang
GNN
315
3
0
02 Dec 2022
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zhekai Zhang
Ji Lin
Chenlin Meng
Stefano Ermon
Song Han
Jun-Yan Zhu
DiffM
588
62
0
03 Nov 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
340
1
0
22 Oct 2022
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022
Yaoyao Ding
Cody Hao Yu
Bojian Zheng
Yizhi Liu
Yida Wang
Gennady Pekhimenko
289
45
0
18 Oct 2022
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
Wei Gao
Qi Hu
Zhisheng Ye
Yang Liu
Xiaolin Wang
Yingwei Luo
Tianwei Zhang
Yonggang Wen
374
36
0
24 May 2022
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Han Cai
Ji Lin
Chengyue Wu
Zhijian Liu
Haotian Tang
Hanrui Wang
Ligeng Zhu
Song Han
285
134
0
25 Apr 2022
A Survey of Multi-Tenant Deep Learning Inference on GPU
Fuxun Yu
Di Wang
Longfei Shangguan
Minjia Zhang
Chenchen Liu
Xiang Chen
BDL
AI4CE
350
39
0
17 Mar 2022
Efficient Strong Scaling Through Burst Parallel Training
Conference on Machine Learning and Systems (MLSys), 2021
S. Park
Joshua Fried
Sunghyun Kim
Mohammad Alizadeh
Adam Belay
GNN
LRM
293
12
0
19 Dec 2021
A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules
Xinfeng Xie
Prakash Prabhu
Ulysse Beaugnon
P. Phothilimthana
Sudip Roy
Azalia Mirhoseini
E. Brevdo
James Laudon
Yanqi Zhou
166
7
0
07 Dec 2021
Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
Fuxun Yu
Shawn Bray
Di Wang
Longfei Shangguan
Xulong Tang
Chenchen Liu
Xiang Chen
144
50
0
28 Nov 2021
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities
Fuxun Yu
Di Wang
Longfei Shangguan
Minjia Zhang
Xulong Tang
Chenchen Liu
Xiang Chen
382
13
0
28 Nov 2021
The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding
Pratik Fegade
Tianqi Chen
Phillip B. Gibbons
T. Mowry
477
36
0
19 Oct 2021
Third ArchEdge Workshop: Exploring the Design Space of Efficient Deep Neural Networks
Fuxun Yu
Dimitrios Stamoulis
Di Wang
Dimitrios Lymberopoulos
Xiang Chen
3DV
187
1
0
22 Nov 2020
1
Page 1 of 1