Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1811.08886
Cited By
v1
v2
v3 (latest)
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Computer Vision and Pattern Recognition (CVPR), 2018
21 November 2018
Kuan-Chieh Wang
Zhijian Liu
Chengyue Wu
Ji Lin
Song Han
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"HAQ: Hardware-Aware Automated Quantization with Mixed Precision"
50 / 463 papers shown
Title
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Gopi Krishna Jha
Sameh Gobriel
Liubov Talamanova
Alexander Kozlov
Nilesh Jain
MQ
198
0
0
24 Feb 2025
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
261
1
0
03 Feb 2025
Hardware-Aware DNN Compression for Homogeneous Edge Devices
Kunlong Zhang
Guiying Li
Ning Lu
Peng Yang
Shengcai Liu
225
1
0
25 Jan 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
195
5
0
10 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
243
1
0
06 Jan 2025
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
International Conference on Artificial Neural Networks (ICANN), 2025
Xubin Wang
Weijia Jia
Weijia Jia
441
21
0
04 Jan 2025
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Neural Information Processing Systems (NeurIPS), 2024
Taesik Gong
F. Kawsar
Chulhong Min
268
4
0
09 Dec 2024
MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli
Basile Lewandowski
Lydia Y. Chen
DiffM
MQ
295
0
0
28 Nov 2024
FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!
Yi Ren
Ruge Xu
Xinfei Guo
Weikang Qian
MQ
391
1
0
27 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
International Symposium on High-Performance Computer Architecture (HPCA), 2024
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
382
8
0
24 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
138
0
0
15 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
196
2
0
03 Nov 2024
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
Yuchen Yang
Shubham Ugare
Yifan Zhao
Gagandeep Singh
Sasa Misailovic
MQ
303
1
0
31 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
927
1
0
29 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
European Conference on Computer Vision (ECCV), 2024
Wen Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
218
7
0
25 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
International Conference on Learning Representations (ICLR), 2024
Hao Mark Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
265
8
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
469
6
0
16 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2024
Ruhai Lin
Rui-Jie Zhu
Nhan Duy Truong
174
1
0
12 Oct 2024
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024
Mohamed Amine Hamdi
Francesco Daghero
G. M. Sarda
Josse Van Delm
Arne Symons
Luca Benini
Marian Verhelst
Daniele Jahier Pagliari
Luca Bompani
191
6
0
11 Oct 2024
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
Yanfeng Jiang
Zelan Yang
B. Chen
Shen Li
Shen Li
Tao Li
MQ
132
4
0
11 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
270
0
0
30 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Asia and South Pacific Design Automation Conference (ASP-DAC), 2024
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
285
5
0
26 Sep 2024
UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning
European Conference on Computer Vision (ECCV), 2024
Kathakoli Sengupta
Zhongkai Shagguan
Sandesh Bharadwaj
Sanjay Arora
Eshed Ohn-Bar
Renato Mancuso
375
2
0
17 Sep 2024
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare
Zhikai Li
Jing Zhang
Qingyi Gu
MedIm
247
3
0
14 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Li Zhang
Mark Sandler
Andrew G. Howard
MQ
174
2
0
14 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
240
1
0
03 Sep 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Computers & graphics (CG), 2024
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
262
16
0
15 Aug 2024
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
317
2
0
26 Jul 2024
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Zhengang Li
Alec Lu
Yanyue Xie
Zhenglun Kong
Mengshu Sun
...
Zhaoyang Han
Caiwen Ding
Yanzhi Wang
Xue Lin
Zhenman Fang
196
9
0
25 Jul 2024
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu
Jiaxin Chen
Hanwen Zhong
Di Huang
Yun Wang
MQ
311
23
0
17 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
217
0
0
03 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
235
10
0
01 Jul 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
179
2
0
27 Jun 2024
Real-Time Spacecraft Pose Estimation Using Mixed-Precision Quantized Neural Network on COTS Reconfigurable MPSoC
IEEE International New Circuits and Systems Conference (NEWCAS), 2024
Julien Posso
Guy Bois
Yvon Savaria
163
3
0
06 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
440
60
0
04 Jun 2024
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Tianchen Zhao
Xuefei Ning
Tongcheng Fang
En-hao Liu
Guyue Huang
Zinan Lin
Shengen Yan
Guohao Dai
Yu Wang
MQ
DiffM
271
35
0
28 May 2024
Extreme Compression of Adaptive Neural Images
Leo Hoshikawa
Marcos V. Conde
Takeshi Ohashi
Atsushi Irie
364
1
0
27 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
Neural Information Processing Systems (NeurIPS), 2024
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
291
49
0
23 May 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Ruibing Jin
Xiaoli Li
299
8
0
09 May 2024
Deep Learning for Low-Latency, Quantum-Ready RF Sensing
P. Gokhale
Caitlin Carnahan
William Clark
Teague Tomesh
Frederic T. Chong
264
3
0
27 Apr 2024
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
Cédric Gernigon
Silviu-Ioan Filip
Olivier Sentieys
Clément Coggiola
Mickael Bruno
137
6
0
22 Apr 2024
TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
Haojun Sun
Chen Tang
Zhi Wang
Yuan Meng
Jingyan Jiang
Cheng Wang
Wenwu Zhu
MQ
258
7
0
15 Apr 2024
Differentiable Search for Finding Optimal Quantization Strategy
Lianqiang Li
Chenqian Yan
Yefei Chen
MQ
275
2
0
10 Apr 2024
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
Bowen Pan
Songlin Yang
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Yikang Shen
MoE
250
30
0
08 Apr 2024
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators
Jan Klhufek
Miroslav Safar
Vojtěch Mrázek
Z. Vašíček
Lukás Sekanina
MQ
215
2
0
08 Apr 2024
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Computer Vision and Pattern Recognition (CVPR), 2024
Chee Hong
Kyoung Mu Lee
SupR
MQ
155
7
0
04 Apr 2024
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
IEEE Transactions on Image Processing (TIP), 2024
H. Lee
Jun-Sang Yoo
Seung-Won Jung
SupR
196
9
0
02 Apr 2024
Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter
Yuiko Sakuma
Masakazu Yoshimura
Junji Otsuka
Atsushi Irie
Takeshi Ohashi
MQ
260
0
0
29 Mar 2024
Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks
Guanhua Ding
Zexi Ye
Zhen Zhong
Gang Li
David Shao
175
0
0
29 Mar 2024
Tiny Machine Learning: Progress and Futures
Ji Lin
Ligeng Zhu
Wei-Ming Chen
Wei-Chen Wang
Song Han
218
110
0
28 Mar 2024
Previous
1
2
3
4
5
...
8
9
10
Next