Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1811.08886
Cited By
v1
v2
v3 (latest)
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Computer Vision and Pattern Recognition (CVPR), 2018
21 November 2018
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HAQ: Hardware-Aware Automated Quantization with Mixed Precision"
50 / 464 papers shown
Title
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
234
112
0
28 Mar 2024
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Jiayi Ji
Rongrong Ji
MQ
236
42
0
19 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
258
2
0
11 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
168
2
0
04 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
163
3
0
27 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
276
87
0
15 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
281
6
0
12 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
235
4
0
08 Feb 2024
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers
Design, Automation and Test in Europe (DATE), 2024
Wei Tao
Shenglin He
Kai Lu
Xiaoyang Qu
Guokuan Li
Jiguang Wan
Jianzong Wang
Jing Xiao
MQ
135
1
0
24 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
205
13
0
20 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
216
17
0
03 Jan 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
188
22
0
23 Dec 2023
Efficient Quantization Strategies for Latent Diffusion Models
Yuewei Yang
Xiaoliang Dai
Jialiang Wang
Peizhao Zhang
Hongbo Zhang
DiffM
MQ
274
16
0
09 Dec 2023
Green Edge AI: A Contemporary Survey
Proceedings of the IEEE (Proc. IEEE), 2023
Yuyi Mao
X. Yu
Kaibin Huang
Ying-Jun Angela Zhang
Jun Zhang
386
54
0
01 Dec 2023
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Computer Vision and Pattern Recognition (CVPR), 2023
Huancheng Chen
H. Vikalo
FedML
MQ
276
17
0
29 Nov 2023
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
AAAI Conference on Artificial Intelligence (AAAI), 2023
Han-Byul Kim
Joo Hyung Lee
Sungjoo Yoo
Hong-Seok Kim
MQ
215
9
0
12 Nov 2023
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing
European Conference on Computer Vision (ECCV), 2023
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Zewen Wu
Yansong Tang
Wenwu Zhu
MQ
224
21
0
10 Nov 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
MQ
213
13
0
08 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
International Conference on Machine Learning and Applications (ICMLA), 2023
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
157
24
0
06 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
138
5
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices
IEEE Transactions on Mobile Computing (IEEE TMC), 2023
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
196
6
0
27 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
163
0
0
20 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Clifford Broni-Bediako
Junshi Xia
Xiangwei Zhu
253
15
0
12 Sep 2023
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
496
33
0
11 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
151
1
0
06 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
387
2
0
05 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
IEEE computer architecture letters (CAL), 2023
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
224
9
0
02 Sep 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints
Wenxing Xu
Yuanchun Li
Jiacheng Liu
Yiyou Sun
Zhengyang Cao
Shouqing Yang
Hao Wen
Yunxin Liu
230
2
0
29 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
IEEE International Conference on Computer Vision (ICCV), 2023
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
187
14
0
25 Aug 2023
HyperSNN: A new efficient and robust deep learning model for resource constrained control applications
Zhanglu Yan
Shida Wang
Kaiwen Tang
Wong-Fai Wong
146
2
0
16 Aug 2023
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
214
1
0
15 Aug 2023
EQ-Net: Elastic Quantization Neural Networks
IEEE International Conference on Computer Vision (ICCV), 2023
Ke Xu
Lei Han
Ye Tian
Shangshang Yang
Xingyi Zhang
MQ
347
17
0
15 Aug 2023
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
Seyedarmin Azizi
M. Nazemi
A. Fayyazi
Massoud Pedram
MQ
95
5
0
12 Aug 2023
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
Xavier Fischer
AAML
198
4
0
09 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
285
5
0
07 Aug 2023
Dynamic Token-Pass Transformers for Semantic Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuang Liu
Qiang Zhou
Jing Wang
Fan Wang
Jun Wang
Wei Zhang
ViT
80
11
0
03 Aug 2023
To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Marc Botet Colomer
Pier Luigi Dovesi
Theodoros Panagiotakopoulos
J. Carvalho
Linus Harenstam-Nielsen
Hossein Azizpour
Hedvig Kjellström
Zorah Lähner
Matteo Poggi
TTA
181
17
0
27 Jul 2023
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
European Conference on Computer Vision (ECCV), 2023
Chee Hong
Kyoung Mu Lee
SupR
MQ
220
2
0
25 Jul 2023
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
IEEE International Conference on Computer Vision (ICCV), 2023
Peijie Dong
Lujun Li
Zimian Wei
Xin-Yi Niu
Zhiliang Tian
H. Pan
MQ
219
47
0
20 Jul 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
ACM Computing Surveys (ACM Comput. Surv.), 2023
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Mohamed Bennai
K. Pekmestzi
Dimitrios Soudris
233
17
0
20 Jul 2023
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization
Forum on Specification and Design Languages (FDL), 2023
Daniele Jahier Pagliari
Matteo Risso
Beatrice Alessandra Motetti
Luca Bompani
261
9
0
18 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
Journal of systems architecture (JSA), 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
255
118
0
16 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
136
7
0
10 Jul 2023
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference
International Conference on High Performance Computing (HiPC), 2023
Bahareh Khabbazan
Marc Riera
Antonio González
MQ
183
3
0
28 Jun 2023
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
International Symposium on Low Power Electronics and Design (ISLPED), 2023
Matteo Risso
Luca Bompani
G. M. Sarda
Luca Benini
Enrico Macii
Massimo Poncino
Marian Verhelst
Daniele Jahier Pagliari
175
8
0
08 Jun 2023
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
Clemens J. S. Schaefer
Navid Lambert-Shirzad
Xiaofan Zhang
Chia-Wei Chou
T. Jablin
Jian Li
Elfie Guo
Caitlin Stanton
S. Joshi
Yu Emma Wang
MQ
215
4
0
08 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Conference on Machine Learning and Systems (MLSys), 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
807
944
0
01 Jun 2023
DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning
E. Rahimian
Golara Javadi
Frederick Tung
Gabriel L. Oliveira
MoE
181
3
0
26 May 2023
MixFormerV2: Efficient Fully Transformer Tracking
Neural Information Processing Systems (NeurIPS), 2023
Yutao Cui
Tian-Shu Song
Gangshan Wu
Liming Wang
207
115
0
25 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2023
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
229
7
0
25 May 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next