ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.08886
  4. Cited By
HAQ: Hardware-Aware Automated Quantization with Mixed Precision

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

21 November 2018
Kuan-Chieh Jackson Wang
Zhijian Liu
Yujun Lin
Ji Lin
Song Han
    MQ
ArXivPDFHTML

Papers citing "HAQ: Hardware-Aware Automated Quantization with Mixed Precision"

50 / 435 papers shown
Title
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
151
0
0
09 May 2025
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Learning from Loss Landscape: Generalizable Mixed-Precision Quantization via Adaptive Sharpness-Aware Gradient Aligning
Lianbo Ma
Jianlun Ma
Yuee Zhou
Guoyang Xie
Qiang He
Zhichao Lu
MQ
45
0
0
08 May 2025
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
63
0
0
08 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
54
0
0
05 May 2025
Radio: Rate-Distortion Optimization for Large Language Model Compression
Radio: Rate-Distortion Optimization for Large Language Model Compression
Sean I. Young
MQ
21
0
0
05 May 2025
BackSlash: Rate Constrained Optimized Training of Large Language Models
BackSlash: Rate Constrained Optimized Training of Large Language Models
Jun Wu
Jiangtao Wen
Yuxing Han
34
0
0
23 Apr 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
S.
Brucek Khailany
MQ
42
0
0
19 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
27
1
0
17 Apr 2025
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
24
0
0
13 Apr 2025
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Generative Artificial Intelligence for Internet of Things Computing: A Systematic Survey
Fabrizio Mangione
Claudio Savaglio
Giancarlo Fortino
22
0
0
10 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
21
0
0
06 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Z. Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
56
0
0
31 Mar 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Liang
Xiang Chen
MQ
MoE
49
0
0
27 Mar 2025
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
El-Mehdi El Arar
Silviu-Ioan Filip
Theo Mary
Elisa Riccietti
52
0
0
19 Mar 2025
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng
Shuaiting Li
Zeyu Wang
Kedong Xu
Hong Gu
Kejie Huang
MQ
60
0
0
12 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
50
0
0
08 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
J. Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
44
0
0
07 Mar 2025
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Optimizing DNN Inference on Multi-Accelerator SoCs at Training-time
Matteo Risso
Alessio Burrello
Daniele Jahier Pagliari
41
0
0
24 Feb 2025
KVCrush: Key value cache size-reduction using similarity in head-behaviour
Gopi Krishna Jha
Sameh Gobriel
Liubov Talamanova
Alexander Kozlov
Nilesh Jain
MQ
34
0
0
24 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Boyang Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
31
1
0
19 Feb 2025
Nearly Lossless Adaptive Bit Switching
Nearly Lossless Adaptive Bit Switching
Haiduo Huang
Zhenhua Liu
Tian Xia
Wenzhe zhao
Pengju Ren
MQ
58
0
0
03 Feb 2025
Hardware-Aware DNN Compression for Homogeneous Edge Devices
Kunlong Zhang
Guiying Li
Ning Lu
Peng Yang
K. Tang
46
0
0
28 Jan 2025
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Mix-QViT: Mixed-Precision Vision Transformer Quantization Driven by Layer Importance and Quantization Sensitivity
Navin Ranjan
Andreas E. Savakis
MQ
47
1
0
10 Jan 2025
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Xubin Wang
Weijia Jia
36
0
0
08 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
28
0
0
06 Jan 2025
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI
  Accelerators
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Taesik Gong
F. Kawsar
Chulhong Min
64
0
0
09 Dec 2024
MPQ-Diff: Mixed Precision Quantization for Diffusion Models
Rocco Manz Maruzzelli
Basile Lewandowski
Lydia Y. Chen
DiffM
MQ
98
0
0
28 Nov 2024
FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision
  Quantized DNNs--Down to 2 Bits!
FAMES: Fast Approximate Multiplier Substitution for Mixed-Precision Quantized DNNs--Down to 2 Bits!
Yi Ren
Ruge Xu
Xinfei Guo
Weikang Qian
MQ
69
0
0
27 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped
  Activation Data Format
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
69
0
0
24 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models
  using Soft-Thresholding Mechanism
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
34
0
0
15 Nov 2024
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network
  Acceleration
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
37
1
0
03 Nov 2024
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
ARQ: A Mixed-Precision Quantization Framework for Accurate and Certifiably Robust DNNs
Yuchen Yang
Shubham Ugare
Yifan Zhao
Gagandeep Singh
Sasa Misailovic
MQ
26
0
0
31 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
136
0
0
29 Oct 2024
Content-Aware Radiance Fields: Aligning Model Complexity with Scene
  Intricacy Through Learned Bitwidth Quantization
Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
W. Liu
Xue Xian Zheng
Jingyi Yu
Xin Lou
MQ
29
0
0
25 Oct 2024
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao Chen
Fuwen Tan
Alexandros Kouris
Royson Lee
Hongxiang Fan
Stylianos I. Venieris
MQ
23
1
0
17 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
27
2
0
16 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason Eshraghian
32
1
0
12 Oct 2024
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
Mohamed Amine Hamdi
Francesco Daghero
G. M. Sarda
Josse Van Delm
Arne Symons
Luca Benini
Marian Verhelst
Daniele Jahier Pagliari
Alessio Burrello
29
1
0
11 Oct 2024
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise
  Dropout and Separate Quantization
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
Yanfeng Jiang
Zelan Yang
B. Chen
Shen Li
Yong Li
Tao Li
MQ
34
0
0
11 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
21
0
0
30 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on
  GPU Tensor Cores
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
28
4
0
26 Sep 2024
UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning
UniLCD: Unified Local-Cloud Decision-Making via Reinforcement Learning
Kathakoli Sengupta
Zhongkai Shagguan
Sandesh Bharadwaj
Sanjay Arora
Eshed Ohn-Bar
Renato Mancuso
53
0
0
17 Sep 2024
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in
  Healthcare
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare
Zhikai Li
Jing Zhang
Qingyi Gu
MedIm
36
1
0
14 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
27
2
0
14 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight
  Quantization
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
40
1
0
03 Sep 2024
Computer Vision Model Compression Techniques for Embedded Systems: A
  Survey
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
28
5
0
15 Aug 2024
Mixed Non-linear Quantization for Vision Transformers
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
35
0
0
26 Jul 2024
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for
  Vision Transformers
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Zhengang Li
Alec Lu
Yanyue Xie
Zhenglun Kong
Mengshu Sun
...
Peiyan Dong
Caiwen Ding
Yanzhi Wang
Xue Lin
Zhenman Fang
32
5
0
25 Jul 2024
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive
  Logarithm Quantizer
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu
Jiaxin Chen
Hanwen Zhong
Di Huang
Yun Wang
MQ
38
9
0
17 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid
  Computation
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
63
0
0
03 Jul 2024
123456789
Next