Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1811.08886
Cited By
v1
v2
v3 (latest)
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Computer Vision and Pattern Recognition (CVPR), 2018
21 November 2018
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HAQ: Hardware-Aware Automated Quantization with Mixed Precision"
50 / 464 papers shown
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
262
117
0
28 Mar 2024
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Jiayi Ji
Rongrong Ji
MQ
251
42
0
19 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
263
2
0
11 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
182
2
0
04 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
167
3
0
27 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
291
88
0
15 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
304
6
0
12 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
251
5
0
08 Feb 2024
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers
Design, Automation and Test in Europe (DATE), 2024
Wei Tao
Shenglin He
Kai Lu
Xiaoyang Qu
Guokuan Li
Jiguang Wan
Jianzong Wang
Jing Xiao
MQ
143
1
0
24 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
226
13
0
20 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
225
17
0
03 Jan 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
195
23
0
23 Dec 2023
Efficient Quantization Strategies for Latent Diffusion Models
Yuewei Yang
Xiaoliang Dai
Jialiang Wang
Peizhao Zhang
Hongbo Zhang
DiffM
MQ
283
16
0
09 Dec 2023
Green Edge AI: A Contemporary Survey
Proceedings of the IEEE (Proc. IEEE), 2023
Yuyi Mao
X. Yu
Kaibin Huang
Ying-Jun Angela Zhang
Jun Zhang
398
57
0
01 Dec 2023
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Computer Vision and Pattern Recognition (CVPR), 2023
Huancheng Chen
H. Vikalo
FedML
MQ
288
18
0
29 Nov 2023
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
AAAI Conference on Artificial Intelligence (AAAI), 2023
Han-Byul Kim
Joo Hyung Lee
Sungjoo Yoo
Hong-Seok Kim
MQ
241
9
0
12 Nov 2023
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing
European Conference on Computer Vision (ECCV), 2023
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Zewen Wu
Yansong Tang
Wenwu Zhu
MQ
235
21
0
10 Nov 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
MQ
242
13
0
08 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
International Conference on Machine Learning and Applications (ICMLA), 2023
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
168
25
0
06 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
159
5
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices
IEEE Transactions on Mobile Computing (IEEE TMC), 2023
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
235
6
0
27 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
171
0
0
20 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Clifford Broni-Bediako
Junshi Xia
Xiangwei Zhu
276
15
0
12 Sep 2023
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
507
33
0
11 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
169
1
0
06 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
407
2
0
05 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
IEEE computer architecture letters (CAL), 2023
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
236
9
0
02 Sep 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints
Wenxing Xu
Yuanchun Li
Jiacheng Liu
Yiyou Sun
Zhengyang Cao
Shouqing Yang
Hao Wen
Yunxin Liu
244
2
0
29 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
IEEE International Conference on Computer Vision (ICCV), 2023
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
219
15
0
25 Aug 2023
HyperSNN: A new efficient and robust deep learning model for resource constrained control applications
Zhanglu Yan
Shida Wang
Kaiwen Tang
Wong-Fai Wong
150
2
0
16 Aug 2023
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
223
1
0
15 Aug 2023
EQ-Net: Elastic Quantization Neural Networks
IEEE International Conference on Computer Vision (ICCV), 2023
Ke Xu
Lei Han
Ye Tian
Shangshang Yang
Xingyi Zhang
MQ
347
18
0
15 Aug 2023
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
Seyedarmin Azizi
M. Nazemi
A. Fayyazi
Massoud Pedram
MQ
114
5
0
12 Aug 2023
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
Xavier Fischer
AAML
216
4
0
09 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
290
5
0
07 Aug 2023
Dynamic Token-Pass Transformers for Semantic Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuang Liu
Qiang Zhou
Jing Wang
Fan Wang
Jun Wang
Wei Zhang
ViT
88
11
0
03 Aug 2023
To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Marc Botet Colomer
Pier Luigi Dovesi
Theodoros Panagiotakopoulos
J. Carvalho
Linus Harenstam-Nielsen
Hossein Azizpour
Hedvig Kjellström
Zorah Lähner
Matteo Poggi
TTA
186
18
0
27 Jul 2023
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
European Conference on Computer Vision (ECCV), 2023
Chee Hong
Kyoung Mu Lee
SupR
MQ
240
2
0
25 Jul 2023
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
IEEE International Conference on Computer Vision (ICCV), 2023
Peijie Dong
Lujun Li
Zimian Wei
Xin-Yi Niu
Zhiliang Tian
H. Pan
MQ
231
47
0
20 Jul 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
ACM Computing Surveys (ACM Comput. Surv.), 2023
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Mohamed Bennai
K. Pekmestzi
Dimitrios Soudris
245
17
0
20 Jul 2023
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization
Forum on Specification and Design Languages (FDL), 2023
Daniele Jahier Pagliari
Matteo Risso
Beatrice Alessandra Motetti
Luca Bompani
266
10
0
18 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
Journal of systems architecture (JSA), 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
263
120
0
16 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
140
7
0
10 Jul 2023
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference
International Conference on High Performance Computing (HiPC), 2023
Bahareh Khabbazan
Marc Riera
Antonio González
MQ
195
3
0
28 Jun 2023
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
International Symposium on Low Power Electronics and Design (ISLPED), 2023
Matteo Risso
Luca Bompani
G. M. Sarda
Luca Benini
Enrico Macii
Massimo Poncino
Marian Verhelst
Daniele Jahier Pagliari
180
8
0
08 Jun 2023
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
Clemens J. S. Schaefer
Navid Lambert-Shirzad
Xiaofan Zhang
Chia-Wei Chou
T. Jablin
Jian Li
Elfie Guo
Caitlin Stanton
S. Joshi
Yu Emma Wang
MQ
219
4
0
08 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Conference on Machine Learning and Systems (MLSys), 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
832
946
0
01 Jun 2023
DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning
E. Rahimian
Golara Javadi
Frederick Tung
Gabriel L. Oliveira
MoE
185
3
0
26 May 2023
MixFormerV2: Efficient Fully Transformer Tracking
Neural Information Processing Systems (NeurIPS), 2023
Yutao Cui
Tian-Shu Song
Gangshan Wu
Liming Wang
210
119
0
25 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2023
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
242
7
0
25 May 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next