ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08886
  4. Cited By
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
v1v2v3 (latest)

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Computer Vision and Pattern Recognition (CVPR), 2018
21 November 2018
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
    MQ
ArXiv (abs)PDFHTML

Papers citing "HAQ: Hardware-Aware Automated Quantization with Mixed Precision"

50 / 464 papers shown
Title
Tiny Machine Learning: Progress and Futures
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
234
112
0
28 Mar 2024
AffineQuant: Affine Transformation Quantization for Large Language
  Models
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Jiayi Ji
Rongrong Ji
MQ
236
42
0
19 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
258
2
0
11 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
168
2
0
04 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
163
3
0
27 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
276
87
0
15 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
281
6
0
12 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank
  Compression Strategy
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViTMQ
235
4
0
08 Feb 2024
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on
  Microcontrollers
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on MicrocontrollersDesign, Automation and Test in Europe (DATE), 2024
Wei Tao
Shenglin He
Kai Lu
Xiaoyang Qu
Guokuan Li
Jiguang Wan
Jianzong Wang
Jing Xiao
MQ
135
1
0
24 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
  Relevance Propagation
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
205
13
0
20 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Retraining-free Model Quantization via One-Shot Weight-Coupling LearningComputer Vision and Pattern Recognition (CVPR), 2024
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
216
17
0
03 Jan 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
  Quantization
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
188
22
0
23 Dec 2023
Efficient Quantization Strategies for Latent Diffusion Models
Efficient Quantization Strategies for Latent Diffusion Models
Yuewei Yang
Xiaoliang Dai
Jialiang Wang
Peizhao Zhang
Hongbo Zhang
DiffMMQ
274
16
0
09 Dec 2023
Green Edge AI: A Contemporary Survey
Green Edge AI: A Contemporary SurveyProceedings of the IEEE (Proc. IEEE), 2023
Yuyi Mao
X. Yu
Kaibin Huang
Ying-Jun Angela Zhang
Jun Zhang
386
54
0
01 Dec 2023
Mixed-Precision Quantization for Federated Learning on
  Resource-Constrained Heterogeneous Devices
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous DevicesComputer Vision and Pattern Recognition (CVPR), 2023
Huancheng Chen
H. Vikalo
FedMLMQ
276
17
0
29 Nov 2023
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation
  Quantization
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation QuantizationAAAI Conference on Artificial Intelligence (AAAI), 2023
Han-Byul Kim
Joo Hyung Lee
Sungjoo Yoo
Hong-Seok Kim
MQ
215
9
0
12 Nov 2023
Post-training Quantization for Text-to-Image Diffusion Models with
  Progressive Calibration and Activation Relaxing
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation RelaxingEuropean Conference on Computer Vision (ECCV), 2023
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Zewen Wu
Yansong Tang
Wenwu Zhu
MQ
224
21
0
10 Nov 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM
  Inference?
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
MQ
213
13
0
08 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
Quantized Transformer Language Model Implementations on Edge DevicesInternational Conference on Machine Learning and Applications (ICMLA), 2023
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
157
24
0
06 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization
  Search
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
138
5
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for
  Mobile Devices
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile DevicesIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
196
6
0
27 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network
  Quantization
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
163
0
0
20 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in
  Remote Sensing
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote SensingIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Clifford Broni-Bediako
Junshi Xia
Xiangwei Zhu
253
15
0
12 Sep 2023
Optimize Weight Rounding via Signed Gradient Descent for the
  Quantization of LLMs
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
496
33
0
11 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
Bandwidth-efficient Inference for Neural Image CompressionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
151
1
0
06 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
387
2
0
05 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large
  Language Models
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language ModelsIEEE computer architecture letters (CAL), 2023
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
224
9
0
02 Sep 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks
  and Resource Constraints
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints
Wenxing Xu
Yuanchun Li
Jiacheng Liu
Yiyou Sun
Zhengyang Cao
Shouqing Yang
Hao Wen
Yunxin Liu
230
2
0
29 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow AvoidanceIEEE International Conference on Computer Vision (ICCV), 2023
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
187
14
0
25 Aug 2023
HyperSNN: A new efficient and robust deep learning model for resource
  constrained control applications
HyperSNN: A new efficient and robust deep learning model for resource constrained control applications
Zhanglu Yan
Shida Wang
Kaiwen Tang
Wong-Fai Wong
146
2
0
16 Aug 2023
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
214
1
0
15 Aug 2023
EQ-Net: Elastic Quantization Neural Networks
EQ-Net: Elastic Quantization Neural NetworksIEEE International Conference on Computer Vision (ICCV), 2023
Ke Xu
Lei Han
Ye Tian
Shangshang Yang
Xingyi Zhang
MQ
347
17
0
15 Aug 2023
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of
  Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
Seyedarmin Azizi
M. Nazemi
A. Fayyazi
Massoud Pedram
MQ
95
5
0
12 Aug 2023
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust
  Neural Network Inference
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
Xavier Fischer
AAML
198
4
0
09 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization
  Search
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
285
5
0
07 Aug 2023
Dynamic Token-Pass Transformers for Semantic Segmentation
Dynamic Token-Pass Transformers for Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yuang Liu
Qiang Zhou
Jing Wang
Fan Wang
Jun Wang
Wei Zhang
ViT
80
11
0
03 Aug 2023
To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation
To Adapt or Not to Adapt? Real-Time Adaptation for Semantic SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Marc Botet Colomer
Pier Luigi Dovesi
Theodoros Panagiotakopoulos
J. Carvalho
Linus Harenstam-Nielsen
Hossein Azizpour
Hedvig Kjellström
Zorah Lähner
Matteo Poggi
TTA
181
17
0
27 Jul 2023
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution
  Networks
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution NetworksEuropean Conference on Computer Vision (ECCV), 2023
Chee Hong
Kyoung Mu Lee
SupRMQ
220
2
0
25 Jul 2023
EMQ: Evolving Training-free Proxies for Automated Mixed Precision
  Quantization
EMQ: Evolving Training-free Proxies for Automated Mixed Precision QuantizationIEEE International Conference on Computer Vision (ICCV), 2023
Peijie Dong
Lujun Li
Zimian Wei
Xin-Yi Niu
Zhiliang Tian
H. Pan
MQ
219
47
0
20 Jul 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and ApplicationsACM Computing Surveys (ACM Comput. Surv.), 2023
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Mohamed Bennai
K. Pekmestzi
Dimitrios Soudris
233
17
0
20 Jul 2023
PLiNIO: A User-Friendly Library of Gradient-based Methods for
  Complexity-aware DNN Optimization
PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN OptimizationForum on Specification and Design Languages (FDL), 2023
Daniele Jahier Pagliari
Matteo Risso
Beatrice Alessandra Motetti
Luca Bompani
261
9
0
18 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer InferenceJournal of systems architecture (JSA), 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
255
118
0
16 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
136
7
0
10 Jul 2023
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN
  Inference
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN InferenceInternational Conference on High Performance Computing (HiPC), 2023
Bahareh Khabbazan
Marc Riera
Antonio González
MQ
183
3
0
28 Jun 2023
Precision-aware Latency and Energy Balancing on Multi-Accelerator
  Platforms for DNN Inference
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN InferenceInternational Symposium on Low Power Electronics and Design (ISLPED), 2023
Matteo Risso
Luca Bompani
G. M. Sarda
Luca Benini
Enrico Macii
Massimo Poncino
Marian Verhelst
Daniele Jahier Pagliari
175
8
0
08 Jun 2023
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision
  Post-Training Quantization
Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization
Clemens J. S. Schaefer
Navid Lambert-Shirzad
Xiaofan Zhang
Chia-Wei Chou
T. Jablin
Jian Li
Elfie Guo
Caitlin Stanton
S. Joshi
Yu Emma Wang
MQ
215
4
0
08 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationConference on Machine Learning and Systems (MLSys), 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDLMQ
807
944
0
01 Jun 2023
DynaShare: Task and Instance Conditioned Parameter Sharing for
  Multi-Task Learning
DynaShare: Task and Instance Conditioned Parameter Sharing for Multi-Task Learning
E. Rahimian
Golara Javadi
Frederick Tung
Gabriel L. Oliveira
MoE
181
3
0
26 May 2023
MixFormerV2: Efficient Fully Transformer Tracking
MixFormerV2: Efficient Fully Transformer TrackingNeural Information Processing Systems (NeurIPS), 2023
Yutao Cui
Tian-Shu Song
Gangshan Wu
Liming Wang
207
115
0
25 May 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware
  Acceleration
PQA: Exploring the Potential of Product Quantization in DNN Hardware AccelerationACM Transactions on Reconfigurable Technology and Systems (TRETS), 2023
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
229
7
0
25 May 2023
Previous
123456...8910
Next