Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.03696
Cited By
HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision
29 April 2019
Zhen Dong
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision"
50 / 95 papers shown
Title
QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads
Khurram Mazher
Saad Bin Nasir
MQ
47
0
0
12 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
157
0
0
09 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
45
0
0
08 May 2025
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
63
0
0
08 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
21
0
0
05 May 2025
Pack-PTQ: Advancing Post-training Quantization of Neural Networks by Pack-wise Reconstruction
Changjun Li
Runqing Jiang
Zhuo Song
Pengpeng Yu
Ye Zhang
Yulan Guo
MQ
56
0
0
01 May 2025
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Yuanbing Ouyang
Yizhuo Liang
Qingpeng Li
Xinfei Guo
Yiming Luo
Di Wu
Hao Wang
Yushan Pan
ViT
VLM
73
0
0
25 Apr 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
152
0
0
10 Mar 2025
GSQ-Tuning: Group-Shared Exponents Integer in Fully Quantized Training for LLMs On-Device Fine-tuning
Sifan Zhou
Shuo Wang
Zhihang Yuan
Mingjia Shi
Yuzhang Shang
Dawei Yang
ALM
MQ
87
0
0
18 Feb 2025
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
85
1
0
08 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurélien Lucchi
AI4CE
43
0
0
04 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
41
0
0
01 Nov 2024
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
Yuchen Yang
Shubham Ugare
Yifan Zhao
Gagandeep Singh
Sasa Misailovic
MQ
26
0
0
31 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
25
1
0
17 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OOD
MQ
28
1
0
08 Oct 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
40
1
0
03 Sep 2024
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
Mingcan Xiang
Steven Jiaxun Tang
Qizheng Yang
Hui Guan
Tongping Liu
VLM
34
0
0
07 Aug 2024
Real-Time Spacecraft Pose Estimation Using Mixed-Precision Quantized Neural Network on COTS Reconfigurable MPSoC
Julien Posso
Guy Bois
Yvon Savaria
25
0
0
06 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
104
23
0
04 Jun 2024
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
Kailash Gogineni
Sai Santosh Dayapule
Juan Gómez Luna
Karthikeya Gogineni
Peng Wei
Tian-Shing Lan
Mohammad Sadrosadati
Onur Mutlu
Guru Venkataramani
50
10
0
07 May 2024
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
Cédric Gernigon
Silviu-Ioan Filip
Olivier Sentieys
Clément Coggiola
Mickael Bruno
23
2
0
22 Apr 2024
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
H. Lee
Jun-Sang Yoo
Seung-Won Jung
SupR
18
2
0
02 Apr 2024
SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay
A. Bhunia
Pinaki Nath Chowdhury
Aneeshan Sain
Tao Xiang
Timothy M. Hospedales
Yi-Zhe Song
SSL
26
9
0
14 Mar 2024
A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation
Luca Valente
Alessandro Nadalini
Asif Veeran
Mattia Sinigaglia
Bruno Sá
...
Baker Mohammad
Sandro Pinto
Daniele Palossi
Luca Benini
Davide Rossi
25
5
0
07 Jan 2024
Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak
Yumou Wei
Ryan F. Forelli
Chris Hansen
Jeffrey P. Levesque
Nhan Tran
Joshua C. Agar
G. D. Guglielmo
M. Mauel
G. Navratil
18
4
0
30 Nov 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
32
7
0
11 Oct 2023
Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message Propagation
Shuang Wang
B. Eravcı
Rustam Guliyev
Hakan Ferhatosmanoglu
GNN
MQ
19
6
0
29 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng R. Li
MQ
25
3
0
07 Aug 2023
Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
Chee Hong
Kyoung Mu Lee
SupR
MQ
19
1
0
25 Jul 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
20
187
0
25 Jul 2023
PTQD: Accurate Post-Training Quantization for Diffusion Models
Yefei He
Luping Liu
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang
DiffM
MQ
30
101
0
18 May 2023
Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing
Francesco Conti
G. Paulin
Angelo Garofalo
D. Rossi
Alfio Di Mauro
Georg Rutishauser
G. Ottavi
M. Eggiman
Hayate Okuhara
Luca Benini
20
14
0
15 May 2023
Patch-wise Mixed-Precision Quantization of Vision Transformer
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
27
12
0
11 May 2023
Diversifying the High-level Features for better Adversarial Transferability
Zhiyuan Wang
Zeliang Zhang
Siyuan Liang
Xiaosen Wang
AAML
37
18
0
20 Apr 2023
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Javier Campos
Zhen Dong
Javier Mauricio Duarte
A. Gholami
Michael W. Mahoney
Jovan Mitrevski
Nhan Tran
MQ
24
3
0
13 Apr 2023
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
Senmao Tian
Ming Lu
Jiaming Liu
Yandong Guo
Yurong Chen
Shunli Zhang
SupR
MQ
20
11
0
13 Apr 2023
AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks
Cheng Gong
Ye Lu
Surong Dai
Deng Qian
Chenkun Du
Tao Li
MQ
27
0
0
07 Apr 2023
Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li
Yijia Liu
Long Lian
Hua Yang
Zhen Dong
Daniel Kang
Shanghang Zhang
Kurt Keutzer
DiffM
MQ
34
152
0
08 Feb 2023
A
2
Q
\rm A^2Q
A
2
Q
: Aggregation-Aware Quantization for Graph Neural Networks
Zeyu Zhu
Fanrong Li
Zitao Mo
Qinghao Hu
Gang Li
Zejian Liu
Xiaoyao Liang
Jian Cheng
GNN
MQ
24
4
0
01 Feb 2023
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Deepika Bablani
J. McKinstry
S. K. Esser
R. Appuswamy
D. Modha
MQ
20
4
0
30 Jan 2023
Tailor: Altering Skip Connections for Resource-Efficient Inference
Olivia Weng
Gabriel Marcano
Vladimir Loncar
Alireza Khodamoradi
Nojan Sheybani
Andres Meza
F. Koushanfar
K. Denolf
Javier Mauricio Duarte
Ryan Kastner
31
11
0
18 Jan 2023
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen-li Ma
Xue Liu
MQ
24
3
0
24 Dec 2022
CSMPQ:Class Separability Based Mixed-Precision Quantization
Ming-Yu Wang
Taisong Jin
Miaohui Zhang
Zhengtao Yu
MQ
23
0
0
20 Dec 2022
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution
Stylianos I. Venieris
Mario Almeida
Royson Lee
Nicholas D. Lane
SupR
13
4
0
15 Dec 2022
Towards Hardware-Specific Automatic Compression of Neural Networks
Torben Krieger
Bernhard Klein
Holger Fröning
MQ
19
2
0
15 Dec 2022
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference
Hai Wu
Ruifei He
Hao Hao Tan
Xiaojuan Qi
Kaibin Huang
MQ
21
2
0
10 Dec 2022
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
27
10
0
06 Dec 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Yijiang Liu
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
29
45
0
29 Nov 2022
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Yunshan Zhong
Gongrui Nan
Yu-xin Zhang
Fei Chao
Rongrong Ji
MQ
18
3
0
12 Nov 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
16
55
0
30 Aug 2022
1
2
Next