Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
1902.06822
Cited By
v1
v2 (latest)
Low-bit Quantization of Neural Networks for Efficient Inference
18 February 2019
Yoni Choukroun
Eli Kravchik
Fan Yang
P. Kisilev
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Low-bit Quantization of Neural Networks for Efficient Inference"
50 / 186 papers shown
Title
Towards Accurate Post-training Quantization for Reparameterized Models
Luoming Zhang
Yefei He
Wen Fei
Zhenyu Lou
Weijia Wu
YangWei Ying
Hong Zhou
MQ
101
1
0
25 Feb 2024
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan
Raghav Magazine
Shubham Kushwaha
M. Debbah
Deepak Gupta
119
28
0
02 Feb 2024
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Ruiping Liu
Kailai Li
Kunyu Peng
Yufan Chen
Ke Cao
Junwei Zheng
M. Sarfraz
Kailun Yang
Rainer Stiefelhagen
VLM
117
10
0
30 Jan 2024
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
Sifan Zhou
Liang Li
Xinyu Zhang
Bo Zhang
Shipeng Bai
Miao Sun
Ziyu Zhao
Xiaobo Lu
Xiangxiang Chu
MQ
116
20
0
29 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
165
7
0
22 Jan 2024
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu
Wenshuo Li
Yehui Tang
Yiman Zhang
Yihao Chen
Houqiang Li
Yunhe Wang
Xinghao Chen
VLM
181
26
0
21 Dec 2023
ARBiBench: Benchmarking Adversarial Robustness of Binarized Neural Networks
Peng Zhao
Jiehua Zhang
Bowen Peng
Longguang Wang
Yingmei Wei
Yu Liu
Li Liu
AAML
121
0
0
21 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
487
19
0
13 Dec 2023
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
M. Lê
Pierre Wolinski
Julyan Arbel
143
14
0
20 Nov 2023
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng
Fei Yang
Ning Sun
Sheng Chen
Yanfeng Jiang
Aimin Pan
MQ
69
0
0
30 Oct 2023
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Shih-yang Liu
Zechun Liu
Xijie Huang
Pingcheng Dong
Kwang-Ting Cheng
MQ
108
73
0
25 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
158
59
0
12 Oct 2023
Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
Luoming Zhang
Wen Fei
Weijia Wu
Yefei He
Zhenyu Lou
Hong Zhou
MQ
82
5
0
07 Oct 2023
MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search
Yichen Xie
Wei Le
MQ
73
4
0
29 Sep 2023
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
169
29
0
26 Sep 2023
Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design
Chao Fang
Wei Sun
Aojun Zhou
Zhongfeng Wang
93
16
0
22 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
71
0
0
20 Sep 2023
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
221
30
0
22 Aug 2023
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
131
18
0
21 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
111
21
0
16 Aug 2023
Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning
Shipeng Bai
Jun Chen
Xintian Shen
Yixuan Qian
Yong Liu
MQ
124
16
0
14 Aug 2023
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
185
77
0
06 Jul 2023
Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning
Jun Chen
Shipeng Bai
Tianxin Huang
Mengmeng Wang
Guanzhong Tian
Y. Liu
MQ
154
22
0
02 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
159
105
0
22 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
83
2
0
07 Jun 2023
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
Zhuocheng Gong
Jiahao Liu
Qifan Wang
Yang Yang
Jingang Wang
Wei Wu
Yunsen Xian
Dongyan Zhao
Rui Yan
MQ
102
6
0
30 May 2023
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
118
26
0
30 May 2023
Post-training Model Quantization Using GANs for Synthetic Data Generation
Athanasios Masouris
Mansi Sharma
Adrian Boguszewski
Alexander Kozlov
Zhuo Wu
Raymond Lo
MQ
78
0
0
10 May 2023
Adaptive Scheduling for Edge-Assisted DNN Serving
Jian He
Chen-Shun Yang
Zhaoyuan He
Ghufran Baig
L. Qiu
72
0
0
19 Apr 2023
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric
Lin Niu
Jia-Wen Liu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
83
2
0
19 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
103
45
0
18 Apr 2023
EcoFed: Efficient Communication for DNN Partitioning-based Federated Learning
Di Wu
R. Ullah
Philip Rodgers
Peter Kilpatrick
I. Spence
Blesson Varghese
FedML
147
3
0
11 Apr 2023
Towards Accurate Post-Training Quantization for Vision Transformer
Yifu Ding
Haotong Qin
Qing-Yu Yan
Z. Chai
Junjie Liu
Xiaolin K. Wei
Xianglong Liu
MQ
155
78
0
25 Mar 2023
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Zhihang Yuan
Jiawei Liu
Jiaxiang Wu
Dawei Yang
Qiang Wu
Guangyu Sun
Wenyu Liu
Xinggang Wang
Bingzhe Wu
MQ
82
7
0
23 Mar 2023
Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems
Jemin Lee
Yongin Kwon
Sihyeong Park
Misun Yu
Jeman Park
Hwanjun Song
ViT
MQ
121
7
0
22 Mar 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
152
1
0
03 Mar 2023
BiBench: Benchmarking and Analyzing Network Binarization
Haotong Qin
Mingyuan Zhang
Yifu Ding
Aoyu Li
Zhongang Cai
Ziwei Liu
Feng Yu
Xianglong Liu
MQ
AAML
141
43
0
26 Jan 2023
PowerQuant: Automorphism Search for Non-Uniform Quantization
Edouard Yvinec
Arnaud Dapogny
Matthieu Cord
Kévin Bailly
MQ
71
19
0
24 Jan 2023
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
257
82
0
14 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
95
7
0
05 Dec 2022
Post-training Quantization on Diffusion Models
Yuzhang Shang
Zhihang Yuan
Bin Xie
Bingzhe Wu
Yan Yan
DiffM
MQ
258
212
0
28 Nov 2022
CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
ViT
MQ
119
1
0
17 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
131
4
0
07 Nov 2022
TPU-MLIR: A Compiler For TPU Using MLIR
Pengchao Hu
Man Lu
Lei Wang
Guoyue Jiang
45
5
0
23 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Tao Gui
F. Yu
Xianglong Liu
MQ
210
172
0
27 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViT
MQ
166
38
0
13 Sep 2022
A simple approach for quantizing neural networks
J. Maly
Rayan Saab
MQ
81
12
0
07 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
118
71
0
30 Aug 2022
Efficient Adaptive Activation Rounding for Post-Training Quantization
Zhengyi Li
Cong Guo
Zhanda Zhu
Yangjie Zhou
Yuxian Qiu
Xiaotian Gao
Jingwen Leng
Minyi Guo
MQ
153
5
0
25 Aug 2022
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Elias Frantar
Sidak Pal Singh
Dan Alistarh
MQ
198
273
0
24 Aug 2022
Previous
1
2
3
4
Next