ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.06762
  4. Cited By
Ansor: Generating High-Performance Tensor Programs for Deep Learning

Ansor: Generating High-Performance Tensor Programs for Deep Learning

11 June 2020
Lianmin Zheng
Chengfan Jia
Minmin Sun
Zhao Wu
Cody Hao Yu
Ameer Haj-Ali
Yida Wang
Jun Yang
Danyang Zhuo
Koushik Sen
Joseph E. Gonzalez
Ion Stoica
ArXivPDFHTML

Papers citing "Ansor: Generating High-Performance Tensor Programs for Deep Learning"

43 / 43 papers shown
Title
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives
X. Zhang
Shaohui Peng
Qirui Zhou
Yuanbo Wen
Qi Guo
...
Ke Gao
Chen Zhao
Yanjun Wu
Yunji Chen
Ling Li
VLM
39
0
0
08 May 2025
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Phantora: Live GPU Cluster Simulation for Machine Learning System Performance Estimation
Jianxing Qin
Jingrong Chen
Xinhao Kong
Yongji Wu
Liang Luo
Z. Wang
Ying Zhang
Tingjun Chen
Alvin R. Lebeck
Danyang Zhuo
119
0
0
02 May 2025
TileLang: A Composable Tiled Programming Model for AI Systems
TileLang: A Composable Tiled Programming Model for AI Systems
Lei Wang
Yu Cheng
Yining Shi
Zhengju Tang
Zhiwen Mo
...
Lingxiao Ma
Yuqing Xia
Jilong Xue
Fan Yang
Z. Yang
63
1
0
24 Apr 2025
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
Hexcute: A Tile-based Programming Language with Automatic Layout and Task-Mapping Synthesis
X. Zhang
Yaoyao Ding
Yang Hu
Gennady Pekhimenko
41
0
0
22 Apr 2025
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Tilus: A Virtual Machine for Arbitrary Low-Precision GPGPU Computation in LLM Serving
Yaoyao Ding
Bohan Hou
X. Zhang
Allan Lin
Tianqi Chen
Cody Yu Hao
Yida Wang
Gennady Pekhimenko
43
0
0
17 Apr 2025
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms
Feiyang Chen
Yu Cheng
Lei Wang
Yuqing Xia
Ziming Miao
...
Fan Yang
J. Xue
Zhi Yang
M. Yang
H. Chen
71
1
0
24 Feb 2025
Data-efficient Performance Modeling via Pre-training
Data-efficient Performance Modeling via Pre-training
Chunting Liu
Riyadh Baghdadi
41
0
0
24 Jan 2025
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
Yuanchang Zhou
Siyu Hu
Chen Wang
Lin-Wang Wang
Guangming Tan
Weile Jia
AI4CE
GNN
50
0
0
30 Dec 2024
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Mufei Li
Viraj Shitole
Eli Chien
Changhai Man
Zhaodong Wang
Srinivas Sridharan
Ying Zhang
Tushar Krishna
P. Li
37
0
0
04 Nov 2024
Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning
  Kernel Schedulers with Coordinate Descent
Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent
Michael Canesche
Gaurav Verma
Fernando Magno Quintao Pereira
16
1
0
28 Jun 2024
Scorch: A Library for Sparse Deep Learning
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
25
0
0
27 May 2024
Allo: A Programming Model for Composable Accelerator Design
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
41
14
0
07 Apr 2024
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers
Massinissa Merouani
Khaled Afif Boudaoud
Iheb Nassim Aouadj
Nassim Tchoulak
Islam Kara Bernou
Hamza Benyamina
F. B. Tayeb
K. Benatchba
Hugh Leather
Riyadh Baghdadi
37
2
0
18 Mar 2024
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Ruihang Lai
Junru Shao
Siyuan Feng
Steven Lyubomirsky
Bohan Hou
...
Sunghyun Park
Prakalp Srivastava
Jared Roesch
T. Mowry
Tianqi Chen
45
9
0
01 Nov 2023
Target-independent XLA optimization using Reinforcement Learning
Target-independent XLA optimization using Reinforcement Learning
Milan Ganai
Haichen Li
Theodore Enns
Yida Wang
Randy Huang
32
0
0
28 Aug 2023
PowerFusion: A Tensor Compiler with Explicit Data Movement Description
  and Instruction-level Graph IR
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Zixuan Ma
Haojie Wang
Jingze Xing
Liyan Zheng
Chen Zhang
Huanqi Cao
Kezhao Huang
Shizhi Tang
Penghan Wang
Jidong Zhai
GNN
29
1
0
11 Jul 2023
Operator Fusion in XLA: Analysis and Evaluation
Operator Fusion in XLA: Analysis and Evaluation
Danielle Snider
Ruofan Liang
18
4
0
30 Jan 2023
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on
  Graph Optimization
AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization
Zhiying Xu
H. Peng
Wei Wang
GNN
24
3
0
02 Dec 2022
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
  for Neural Networks
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks
Zining Zhang
Bingsheng He
Zhenjie Zhang
11
4
0
21 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
25
6
0
14 Nov 2022
ALT: Boosting Deep Learning Performance by Breaking the Wall between
  Graph and Operator Level Optimizations
ALT: Boosting Deep Learning Performance by Breaking the Wall between Graph and Operator Level Optimizations
Zhiying Xu
Jiafan Xu
H. Peng
Wei Wang
Xiaoliang Wang
...
Haipeng Dai
Yixu Xu
Hao Cheng
Kun Wang
Guihai Chen
18
0
0
22 Oct 2022
Decompiling x86 Deep Neural Network Executables
Decompiling x86 Deep Neural Network Executables
Zhibo Liu
Yuanyuan Yuan
Shuai Wang
Xiaofei Xie
L. Ma
AAML
39
13
0
03 Oct 2022
Optimizing DNN Compilation for Distributed Training with Joint OP and
  Tensor Fusion
Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion
Xiaodong Yi
Shiwei Zhang
Lansong Diao
Chuan Wu
Zhen Zheng
Shiqing Fan
Siyu Wang
Jun Yang
W. Lin
20
4
0
26 Sep 2022
SONAR: Joint Architecture and System Optimization Search
SONAR: Joint Architecture and System Optimization Search
Elias Jääsaari
Michelle Ma
Ameet Talwalkar
Tianqi Chen
36
1
0
25 Aug 2022
OLLIE: Derivation-based Tensor Program Optimizer
OLLIE: Derivation-based Tensor Program Optimizer
Liyan Zheng
Haojie Wang
Jidong Zhai
Muyan Hu
Zixuan Ma
Tuowei Wang
Shizhi Tang
Lei Xie
Kezhao Huang
Zhihao Jia
38
3
0
02 Aug 2022
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning
  Compilers
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers
Jiawei Liu
Jinkun Lin
Fabian Ruffy
Cheng Tan
Jinyang Li
Aurojit Panda
Lingming Zhang
65
57
0
26 Jul 2022
Productive Reproducible Workflows for DNNs: A Case Study for Industrial
  Defect Detection
Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection
Perry Gibson
José Cano
AI4CE
30
1
0
19 Jun 2022
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time
  and Robustness
HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness
D. Rieber
Moritz Reiber
Oliver Bringmann
Holger Fröning
16
4
0
31 May 2022
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
  Accelerators
Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
Axel Stjerngren
Perry Gibson
José Cano
20
4
0
26 Apr 2022
Shisha: Online scheduling of CNN pipelines on heterogeneous
  architectures
Shisha: Online scheduling of CNN pipelines on heterogeneous architectures
Pirah Noor Soomro
M. Abduljabbar
J. Castrillón
Miquel Pericàs
12
1
0
23 Feb 2022
Benchmarking of DL Libraries and Models on Mobile Devices
Benchmarking of DL Libraries and Models on Mobile Devices
Qiyang Zhang
Xiang Li
Xiangying Che
Xiao Ma
Ao Zhou
Mengwei Xu
Shangguang Wang
Yun Ma
Xuanzhe Liu
25
48
0
14 Feb 2022
Learning from distinctive candidates to optimize reduced-precision
  convolution program on tensor cores
Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Junkyeong Choi
Hyucksung Kwon
W. Lee
Jungwook Choi
Jieun Lim
19
0
0
11 Feb 2022
Moses: Efficient Exploitation of Cross-device Transferable Features for
  Tensor Program Optimization
Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization
Zhihe Zhao
Xian Shuai
Yang Bai
Neiwen Ling
Nan Guan
Zhenyu Yan
Guoliang Xing
17
6
0
15 Jan 2022
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program
  Code Generation
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson
José Cano
15
12
0
14 Jan 2022
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation
  Subgraph Similarity
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph Similarity
Shanjun Zhang
Mingzhen Li
Hailong Yang
Yi Liu
Zhongzhi Luan
D. Qian
21
0
0
01 Jan 2022
Bolt: Bridging the Gap between Auto-tuners and Hardware-native
  Performance
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Jiarong Xing
Leyuan Wang
Shang Zhang
Jack H Chen
Ang Chen
Yibo Zhu
25
43
0
25 Oct 2021
CompilerGym: Robust, Performant Compiler Optimization Environments for
  AI Research
CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research
Chris Cummins
Bram Wasti
Jiadong Guo
Brandon Cui
Jason Ansel
...
Jia-Wei Liu
O. Teytaud
Benoit Steiner
Yuandong Tian
Hugh Leather
28
68
0
17 Sep 2021
1xN Pattern for Pruning Convolutional Neural Networks
1xN Pattern for Pruning Convolutional Neural Networks
Mingbao Lin
Yu-xin Zhang
Yuchao Li
Bohong Chen
Fei Chao
Mengdi Wang
Shen Li
Yonghong Tian
Rongrong Ji
3DPC
31
40
0
31 May 2021
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks
Yao Wang
Xingyu Zhou
Yanming Wang
Rui Li
Yong Wu
Vin Sharma
16
8
0
29 Apr 2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency
  and Portability in Deep Learning & HPC Workloads
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads
E. Georganas
Dhiraj D. Kalamkar
Sasikanth Avancha
Menachem Adelman
Deepti Aggarwal
...
Ramanarayan Mohanty
Hans Pabst
Brian Retford
Barukh Ziv
A. Heinecke
26
17
0
12 Apr 2021
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
DISC: A Dynamic Shape Compiler for Machine Learning Workloads
Kai Zhu
Wenyi Zhao
Zhen Zheng
Tianyou Guo
Pengzhan Zhao
...
Junjie Bai
Jun Yang
Xiaoyong Liu
Lansong Diao
Wei Lin
19
27
0
09 Mar 2021
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds
  through Dynamic Communication Scheduling
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling
Shangming Cai
Dongsheng Wang
Haixia Wang
Yongqiang Lyu
Guangquan Xu
Xi Zheng
A. Vasilakos
24
6
0
20 Jan 2021
A model-driven approach for a new generation of adaptive libraries
A model-driven approach for a new generation of adaptive libraries
Marco Cianfriglia
Damiano Perri
C. Nugteren
Anton Lokhmotov
G. Fursin
14
14
0
19 Jun 2018
1