v1v2 (latest)

IOS: Inter-Operator Scheduler for CNN Acceleration

Conference on Machine Learning and Systems (MLSys), 2020

2 November 2020

Song Han

ArXiv (abs)PDF HTML Github (200★)

Papers citing "IOS: Inter-Operator Scheduler for CNN Acceleration"

24 / 24 papers shown

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

Deokjae Lee

Hyun Oh Song

265

24 Sep 2025

Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers

335

14 Sep 2025

REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving

311

02 Jun 2025

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation

375

17 Apr 2025

Enabling Resource-efficient AIoT System with Cross-level Optimization: A surveyIEEE Communications Surveys and Tutorials (COMST), 2023

345

27 Sep 2023

Automatic Task Parallelization of Dataflow Graphs in ML/DL modelsIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023

Srinjoy Das

Lawrence Rauchwerger

212

22 Aug 2023

Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

Zinuo Cai

123

21 Jul 2023

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph OptimizationInternational Symposium on Software Testing and Analysis (ISSTA), 2023

205

11 Jul 2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPUACM International Conference on Embedded Networked Sensor Systems (SenSys), 2023

180

10 Jul 2023

Proteus: Simulating the Performance of Distributed DNN TrainingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

269

04 Jun 2023

Canvas: End-to-End Kernel Architecture Search in Neural Networks

Chenggang Zhao

Genghan Zhang

Mingyu Gao

244

16 Apr 2023

AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph OptimizationIEEE Conference on Computer Communications (INFOCOM), 2022

315

02 Dec 2022

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Song Han

Jun-Yan Zhu

DiffM

588

03 Nov 2022

ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations

...

340

22 Oct 2022

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor ProgramsInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022

289

18 Oct 2022

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

Xiaolin Wang

Yingwei Luo

Tianwei Zhang

Yonggang Wen

374

24 May 2022

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Zhijian Liu

Song Han

285

134

25 Apr 2022

A Survey of Multi-Tenant Deep Learning Inference on GPU

350

17 Mar 2022

Efficient Strong Scaling Through Burst Parallel TrainingConference on Machine Learning and Systems (MLSys), 2021

293

19 Dec 2021

A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

166

07 Dec 2021

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

144

28 Nov 2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

382

28 Nov 2021

The CoRa Tensor Compiler: Compilation for Ragged Tensors with Minimal Padding

477

19 Oct 2021

Third ArchEdge Workshop: Exploring the Design Space of Efficient Deep Neural Networks

Fuxun Yu

Dimitrios Stamoulis

Di Wang

Dimitrios Lymberopoulos

Xiang Chen

3DV

187

22 Nov 2020