Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.03852
Cited By
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
10 November 2019
Zhen Dong
Z. Yao
Yaohui Cai
Daiyaan Arfeen
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks"
44 / 44 papers shown
Title
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
63
0
0
08 May 2025
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Changjun Li
Runqing Jiang
Zhuo Song
Pengpeng Yu
Ye Zhang
Yulan Guo
MQ
49
0
0
01 May 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
120
0
0
10 Mar 2025
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso
Alessio Burrello
Daniele Jahier Pagliari
41
0
0
24 Feb 2025
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
85
1
0
08 Dec 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
23
1
0
17 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
P
2
^2
2
-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
40
3
0
30 May 2024
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
27
7
0
11 Oct 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng R. Li
MQ
25
3
0
07 Aug 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
20
186
0
25 Jul 2023
Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
Matteo Risso
Alessio Burrello
G. M. Sarda
Luca Benini
Enrico Macii
M. Poncino
Marian Verhelst
Daniele Jahier Pagliari
28
4
0
08 Jun 2023
Patch-wise Mixed-Precision Quantization of Vision Transformer
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
22
12
0
11 May 2023
Diversifying the High-level Features for better Adversarial Transferability
Zhiyuan Wang
Zeliang Zhang
Siyuan Liang
Xiaosen Wang
AAML
37
18
0
20 Apr 2023
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Javier Campos
Zhen Dong
Javier Mauricio Duarte
A. Gholami
Michael W. Mahoney
Jovan Mitrevski
Nhan Tran
MQ
24
3
0
13 Apr 2023
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
Senmao Tian
Ming Lu
Jiaming Liu
Yandong Guo
Yurong Chen
Shunli Zhang
SupR
MQ
18
11
0
13 Apr 2023
AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks
Cheng Gong
Ye Lu
Surong Dai
Deng Qian
Chenkun Du
Tao Li
MQ
24
0
0
07 Apr 2023
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffM
MQ
34
152
0
08 Feb 2023
A
2
Q
\rm A^2Q
A
2
Q
: Aggregation-Aware Quantization for Graph Neural Networks
Zeyu Zhu
Fanrong Li
Zitao Mo
Qinghao Hu
Gang Li
Zejian Liu
Xiaoyao Liang
Jian Cheng
GNN
MQ
12
4
0
01 Feb 2023
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Deepika Bablani
J. McKinstry
S. K. Esser
R. Appuswamy
D. Modha
MQ
8
4
0
30 Jan 2023
Redistribution of Weights and Activations for AdderNet Quantization
Ying Nie
Kai Han
Haikang Diao
Chuanjian Liu
Enhua Wu
Yunhe Wang
MQ
44
5
0
20 Dec 2022
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution
Stylianos I. Venieris
Mario Almeida
Royson Lee
Nicholas D. Lane
SupR
10
4
0
15 Dec 2022
Towards Hardware-Specific Automatic Compression of Neural Networks
Torben Krieger
Bernhard Klein
Holger Fröning
MQ
19
2
0
15 Dec 2022
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
24
10
0
06 Dec 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
18
11
0
11 Aug 2022
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
14
1
0
31 Jul 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
Xiuying Wei
Ruihao Gong
Yuhang Li
Xianglong Liu
F. Yu
MQ
VLM
19
165
0
11 Mar 2022
Structured Pruning is All You Need for Pruning CNNs at Initialization
Yaohui Cai
Weizhe Hua
Hongzheng Chen
G. E. Suh
Christopher De Sa
Zhiru Zhang
CVBM
33
14
0
04 Mar 2022
Quantization in Layer's Input is Matter
Daning Cheng
Wenguang Chen
MQ
11
0
0
10 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
30
283
0
14 Jan 2022
Neural Network Quantization for Efficient Inference: A Survey
Olivia Weng
MQ
14
22
0
08 Dec 2021
Mixed Precision of Quantization of Transformer Language Models for Speech Recognition
Junhao Xu
Shoukang Hu
Jianwei Yu
Xunying Liu
Helen M. Meng
MQ
30
15
0
29 Nov 2021
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
16
24
0
24 Nov 2021
Arch-Net: Model Distillation for Architecture Agnostic Model Deployment
Weixin Xu
Zipeng Feng
Shuangkang Fang
Song Yuan
Yi Yang
Shuchang Zhou
MQ
16
1
0
01 Nov 2021
Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Sanghyun Hong
Michael-Andrei Panaitescu-Liess
Yigitcan Kaya
Tudor Dumitras
MQ
47
13
0
26 Oct 2021
Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization
Weihan Chen
Peisong Wang
Jian Cheng
MQ
31
61
0
13 Oct 2021
Machine Learning Advances aiding Recognition and Classification of Indian Monuments and Landmarks
A. Paul
S. Ghose
K. Aggarwal
Niketha Nethaji
Shivam Pal
Arnab Dutta Purkayastha
15
9
0
29 Jul 2021
Post-Training Quantization for Vision Transformer
Zhenhua Liu
Yunhe Wang
Kai Han
Siwei Ma
Wen Gao
ViT
MQ
39
321
0
27 Jun 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
Dynamic Precision Analog Computing for Neural Networks
Sahaj Garg
Joe Lou
Anirudh Jain
Mitchell Nahmias
34
32
0
12 Feb 2021
ZeroQ: A Novel Zero Shot Quantization Framework
Yaohui Cai
Z. Yao
Zhen Dong
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
27
389
0
01 Jan 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
574
0
12 Sep 2019
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
311
1,047
0
10 Feb 2017
1