Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.08886
Cited By
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
21 November 2018
Kuan-Chieh Jackson Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HAQ: Hardware-Aware Automated Quantization with Mixed Precision"
50 / 435 papers shown
Title
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
151
0
0
09 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
45
0
0
08 May 2025
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
63
0
0
08 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
54
0
0
05 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
21
0
0
05 May 2025
BackSlash: Rate Constrained Optimized Training of Large Language Models
Jun Wu
Jiangtao Wen
Yuxing Han
34
0
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
S.
Brucek Khailany
MQ
42
0
0
19 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
27
1
0
17 Apr 2025
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
24
0
0
13 Apr 2025
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Fabrizio Mangione
Claudio Savaglio
Giancarlo Fortino
22
0
0
10 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
21
0
0
06 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Z. Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
56
0
0
31 Mar 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Liang
Xiang Chen
MQ
MoE
49
0
0
27 Mar 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
52
0
0
19 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
60
0
0
12 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
50
0
0
08 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
J. Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
44
0
0
07 Mar 2025
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso
Alessio Burrello
Daniele Jahier Pagliari
41
0
0
24 Feb 2025
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Gopi Krishna Jha
Sameh Gobriel
Liubov Talamanova
Alexander Kozlov
Nilesh Jain
MQ
34
0
0
24 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Boyang Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
31
1
0
19 Feb 2025
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
58
0
0
03 Feb 2025
Hardware-Aware DNN Compression for Homogeneous Edge Devices
Kunlong Zhang
Guiying Li
Ning Lu
Peng Yang
K. Tang
46
0
0
28 Jan 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
47
1
0
10 Jan 2025
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Xubin Wang
Weijia Jia
36
0
0
08 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
28
0
0
06 Jan 2025
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Taesik Gong
F. Kawsar
Chulhong Min
64
0
0
09 Dec 2024
MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli
Basile Lewandowski
Lydia Y. Chen
DiffM
MQ
98
0
0
28 Nov 2024
FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!
Yi Ren
Ruge Xu
Xinfei Guo
Weikang Qian
MQ
69
0
0
27 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
69
0
0
24 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
34
0
0
15 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
37
1
0
03 Nov 2024
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
Yuchen Yang
Shubham Ugare
Yifan Zhao
Gagandeep Singh
Sasa Misailovic
MQ
26
0
0
31 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
136
0
0
29 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
W. Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
29
0
0
25 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
23
1
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
27
2
0
16 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason Eshraghian
32
1
0
12 Oct 2024
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
Mohamed Amine Hamdi
Francesco Daghero
G. M. Sarda
Josse Van Delm
Arne Symons
Luca Benini
Marian Verhelst
Daniele Jahier Pagliari
Alessio Burrello
29
1
0
11 Oct 2024
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
Yanfeng Jiang
Zelan Yang
B. Chen
Shen Li
Yong Li
Tao Li
MQ
34
0
0
11 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
21
0
0
30 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
28
4
0
26 Sep 2024
UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning
Kathakoli Sengupta
Zhongkai Shagguan
Sandesh Bharadwaj
Sanjay Arora
Eshed Ohn-Bar
Renato Mancuso
53
0
0
17 Sep 2024
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare
Zhikai Li
Jing Zhang
Qingyi Gu
MedIm
36
1
0
14 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
27
2
0
14 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
40
1
0
03 Sep 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
28
5
0
15 Aug 2024
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
35
0
0
26 Jul 2024
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Zhengang Li
Alec Lu
Yanyue Xie
Zhenglun Kong
Mengshu Sun
...
Peiyan Dong
Caiwen Ding
Yanzhi Wang
Xue Lin
Zhenman Fang
32
5
0
25 Jul 2024
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu
Jiaxin Chen
Hanwen Zhong
Di Huang
Yun Wang
MQ
38
9
0
17 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
63
0
0
03 Jul 2024
1
2
3
4
5
6
7
8
9
Next