ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08886
  4. Cited By
HAQ: Hardware-Aware Automated Quantization with Mixed Precision

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

21 November 2018
Kuan-Chieh Jackson Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
    MQ
ArXivPDFHTML

Papers citing "HAQ: Hardware-Aware Automated Quantization with Mixed Precision"

50 / 435 papers shown
Title
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Alessio Burrello
Enrico Macii
M. Poncino
Daniele Jahier Pagliari
MQ
44
2
0
01 Jul 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
38
2
0
27 Jun 2024
Real-Time Spacecraft Pose Estimation Using Mixed-Precision Quantized
  Neural Network on COTS Reconfigurable MPSoC
Real-Time Spacecraft Pose Estimation Using Mixed-Precision Quantized Neural Network on COTS Reconfigurable MPSoC
Julien Posso
Guy Bois
Yvon Savaria
25
0
0
06 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
99
23
0
04 Jun 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with
  Metric-Decoupled Mixed Precision Quantization
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao
Xuefei Ning
Tongcheng Fang
En-hao Liu
Guyue Huang
Zinan Lin
Shengen Yan
Guohao Dai
Yu-Xiang Wang
MQ
DiffM
72
17
0
28 May 2024
Extreme Compression of Adaptive Neural Images
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
40
1
0
27 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient
  Token Identification
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
35
25
0
23 May 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of
  Deep Neural Networks
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Min-man Wu
Xiaoli Li
33
1
0
09 May 2024
Acceleration Algorithms in GNNs: A Survey
Acceleration Algorithms in GNNs: A Survey
Lu Ma
Zeang Sheng
Xunkai Li
Xin Gao
Zhezheng Hao
Ling Yang
Wentao Zhang
Bin Cui
GNN
39
3
0
07 May 2024
Deep Learning for Low-Latency, Quantum-Ready RF Sensing
Deep Learning for Low-Latency, Quantum-Ready RF Sensing
P. Gokhale
Caitlin Carnahan
William Clark
Teague Tomesh
Frederic T. Chong
21
1
0
27 Apr 2024
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
Cédric Gernigon
Silviu-Ioan Filip
Olivier Sentieys
Clément Coggiola
Mickael Bruno
23
2
0
22 Apr 2024
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection
  for Efficient Diffusion Models
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
Haojun Sun
Chen Tang
Zhi Wang
Yuan Meng
Jingyan Jiang
Xinzhu Ma
Wenwu Zhu
MQ
31
5
0
15 Apr 2024
Differentiable Search for Finding Optimal Quantization Strategy
Differentiable Search for Finding Optimal Quantization Strategy
Lianqiang Li
Chenqian Yan
Yefei Chen
MQ
21
2
0
10 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of
  Mixture-of-Experts Language Models
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Yikang Shen
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Rameswar Panda
MoE
38
19
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
32
1
0
08 Apr 2024
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Chee Hong
Kyoung Mu Lee
SupR
MQ
26
2
0
04 Apr 2024
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
H. Lee
Jun-Sang Yoo
Seung-Won Jung
SupR
18
2
0
02 Apr 2024
Mixed-precision Supernet Training from Vision Foundation Models using
  Low Rank Adapter
Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter
Yuiko Sakuma
Masakazu Yoshimura
Junji Otsuka
Atsushi Irie
Takeshi Ohashi
MQ
32
0
0
29 Mar 2024
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output
  Channel Pruning on Computer Vision Tasks
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Guanhua Ding
Zexi Ye
Zhen Zhong
Gang Li
David Shao
34
0
0
29 Mar 2024
Tiny Machine Learning: Progress and Futures
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
41
51
0
28 Mar 2024
AffineQuant: Affine Transformation Quantization for Large Language
  Models
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Fei Chao
Rongrong Ji
MQ
38
19
0
19 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
24
1
0
11 Mar 2024
Better Schedules for Low Precision Training of Deep Neural Networks
Better Schedules for Low Precision Training of Deep Neural Networks
Cameron R. Wolfe
Anastasios Kyrillidis
45
1
0
04 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
28
1
0
27 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
41
47
0
15 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
42
2
0
12 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank
  Compression Strategy
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
38
2
0
08 Feb 2024
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on
  Microcontrollers
Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers
Wei Tao
Shenglin He
Kai Lu
Xiaoyang Qu
Guokuan Li
Jiguang Wan
Jianzong Wang
Jing Xiao
MQ
18
0
0
24 Jan 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
  Relevance Propagation
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
24
6
0
20 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
22
8
0
03 Jan 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
  Quantization
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
33
10
0
23 Dec 2023
Efficient Quantization Strategies for Latent Diffusion Models
Efficient Quantization Strategies for Latent Diffusion Models
Yuewei Yang
Xiaoliang Dai
Jialiang Wang
Peizhao Zhang
Hongbo Zhang
DiffM
MQ
24
13
0
09 Dec 2023
Green Edge AI: A Contemporary Survey
Green Edge AI: A Contemporary Survey
Yuyi Mao
X. Yu
Kaibin Huang
Ying-Jun Angela Zhang
Jun Zhang
28
16
0
01 Dec 2023
Mixed-Precision Quantization for Federated Learning on
  Resource-Constrained Heterogeneous Devices
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen
H. Vikalo
FedML
MQ
16
7
0
29 Nov 2023
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation
  Quantization
MetaMix: Meta-state Precision Searcher for Mixed-precision Activation Quantization
Han-Byul Kim
Joo Hyung Lee
Sungjoo Yoo
Hong-Seok Kim
MQ
24
3
0
12 Nov 2023
Post-training Quantization for Text-to-Image Diffusion Models with
  Progressive Calibration and Activation Relaxing
Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Zewen Wu
Yansong Tang
Wenwu Zhu
MQ
38
16
0
10 Nov 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM
  Inference?
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
G. Constantinides
Yiren Zhao
MQ
19
9
0
08 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
Quantized Transformer Language Model Implementations on Edge Devices
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
12
8
0
06 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization
  Search
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
16
4
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for
  Mobile Devices
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile Devices
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
11
5
0
27 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network
  Quantization
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
16
0
0
20 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in
  Remote Sensing
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing
Clifford Broni-Bediako
Junshi Xia
Naoto Yokoya
38
9
0
12 Sep 2023
Optimize Weight Rounding via Signed Gradient Descent for the
  Quantization of LLMs
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
24
21
0
11 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
Bandwidth-efficient Inference for Neural Image Compression
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
23
1
0
06 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
26
0
0
05 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large
  Language Models
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks
  and Resource Constraints
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints
Wenxing Xu
Yuanchun Li
Jiacheng Liu
Yiyou Sun
Zhengyang Cao
Yixuan Li
Hao Wen
Yunxin Liu
25
0
0
29 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
10
9
0
25 Aug 2023
HyperSNN: A new efficient and robust deep learning model for resource
  constrained control applications
HyperSNN: A new efficient and robust deep learning model for resource constrained control applications
Zhanglu Yan
Shida Wang
Kaiwen Tang
Wong-Fai Wong
11
1
0
16 Aug 2023
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
16
0
0
15 Aug 2023
Previous
123456789
Next