Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

5 February 2019

Eldad Meller

Alexander Finkelstein

Uri Almog

Mark Grobman

ArXiv (abs)PDF HTML

Papers citing "Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization"

50 / 52 papers shown

Title
Sensitivity-Aware Post-Training Quantization for Deep Neural Networks Zekang Zheng Haokun Li Yaofo Chen Zhuliang Yu Qing Du MQ 129 1 0 06 Sep 2025
Symmetry in Neural Network Parameter Spaces Bo Zhao Robin Walters Rose Yu 321 7 0 16 Jun 2025
FPTQuant: Function-Preserving Transforms for LLM Quantization Boris van Breugel Yelysei Bondarenko Paul N. Whatmough Markus Nagel MQ 242 3 0 05 Jun 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Bram Adams Ahmed E. Hassan VLM 358 1 0 01 Nov 2024
Accumulator-Aware Post-Training Quantization for Large Language Models Ian Colbert Giuseppe Franco Fabian Grob Jinjie Zhang Rayan Saab MQ 255 4 0 25 Sep 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models Tianxiang Hao Xiaohan Ding Juexiao Feng Yuhong Yang Hui Chen Guiguang Ding VLM MQ 234 9 0 15 Jul 2024
Low-Rank Quantization-Aware Training for LLMs Yelysei Bondarenko Riccardo Del Chiaro Markus Nagel MQ 276 37 0 10 Jun 2024
Quantization of Large Language Models with an Overdetermined Basis D. Merkulov Daria Cherniuk Alexander Rudikov Ivan Oseledets Ekaterina Muravleva A. Mikhalev Boris Kashin MQ 173 1 0 15 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision QuantizationIEEE International Symposium on Quality Electronic Design (ISQED), 2024 B. Ghavami Amin Kamjoo Lesley Shannon S. Wilton MQ 142 0 0 03 Apr 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Emad Fallahzadeh Bram Adams Ahmed E. Hassan MQ 359 5 0 25 Mar 2024
PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions Edouard Yvinec Arnaud Dapogny Kévin Bailly MQ 256 0 0 27 Nov 2023
Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAsInternational Conference on Field-Programmable Logic and Applications (FPL), 2023 Shivam Aggarwal Hans Jakob Damsgaard Alessandro Pappalardo Giuseppe Franco Thomas B. Preußer Michaela Blott Tulika Mitra MQ 242 8 0 21 Nov 2023
ResQ: Residual Quantization for Video PerceptionIEEE International Conference on Computer Vision (ICCV), 2023 Davide Abati H. Yahia Markus Nagel A. Habibian MQ 167 3 0 18 Aug 2023
Quantization Aware Factorization for Deep Neural Network CompressionJournal of Artificial Intelligence Research (JAIR), 2023 Daria Cherniuk Stanislav Abukhovich Anh-Huy Phan Ivan Oseledets A. Cichocki Julia Gusak MQ 164 5 0 08 Aug 2023
Designing strong baselines for ternary neural network quantization through support and mass equalizationInternational Conference on Information Photonics (ICIP), 2023 Edouard Yvinec Arnaud Dapogny Kévin Bailly MQ 245 0 0 30 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do NothingNeural Information Processing Systems (NeurIPS), 2023 Yelysei Bondarenko Markus Nagel Tijmen Blankevoort MQ 294 120 0 22 Jun 2023
Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT SystemsIEEE Internet of Things Journal (IEEE IoT J.), 2023 Jemin Lee Yongin Kwon Sihyeong Park Misun Yu Jeman Park Hwanjun Song ViT MQ 222 12 0 22 Mar 2023
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom Alexander Finkelstein Ella Fuchs Idan Tal Mark Grobman Niv Vosco Eldad Meller MQ 147 8 0 05 Dec 2022
MinUn: Accurate ML Inference on MicrocontrollersACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), 2022 Shikhar Jaiswal R. Goli Aayan Kumar Vivek Seshadri Rahul Sharma 257 5 0 29 Oct 2022
To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers FoldingInternational Joint Conference on Artificial Intelligence (IJCAI), 2022 Edouard Yvinec Arnaud Dapogny Kévin Bailly 140 9 0 28 Mar 2022
REx: Data-Free Residual Quantization Error ExpansionNeural Information Processing Systems (NeurIPS), 2022 Edouard Yvinec Arnaud Dapgony Matthieu Cord Kévin Bailly MQ 310 9 0 28 Mar 2022
SPIQ: Data-Free Per-Channel Static Input QuantizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022 Edouard Yvinec Arnaud Dapogny Matthieu Cord Kévin Bailly MQ 101 22 0 28 Mar 2022
FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs with Dynamic Fixed-Point RepresentationIEEE Access (IEEE Access), 2022 Ahmad Shawahna S. M. Sait A. El-Maleh Irfan Ahmad MQ 132 8 0 22 Mar 2022
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast DeploymentFuture generations computer systems (FGCS), 2022 Jemin Lee Misun Yu Yongin Kwon Teaho Kim MQ 154 20 0 10 Feb 2022
UWC: Unit-wise Calibration Towards Rapid Network CompressionBritish Machine Vision Conference (BMVC), 2022 Chen Lin Zheyang Li Bo Peng Haoji Hu Wenming Tan Ye Ren Shiliang Pu MQ 87 1 0 17 Jan 2022
A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights StatisticsIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021 Prasen Kumar Sharma Arun Abraham V. N. Rajendiran MQ 340 10 0 06 Dec 2021
Applications and Techniques for Fast Machine Learning in ScienceFrontiers in Big Data (Front. Big Data), 2021 A. Deiana Nhan Tran Joshua C. Agar Michaela Blott G. D. Guglielmo ... Ashish Sharma S. Summers Pietro Vischia J. Vlimant Olivia Weng 192 80 0 25 Oct 2021
RED++ : Data-Free Pruning of Deep Neural Networks via Input Splitting and Output Merging Edouard Yvinec Arnaud Dapogny Matthieu Cord Kévin Bailly 211 24 0 30 Sep 2021
HPTQ: Hardware-Friendly Post Training Quantization H. Habi Reuven Peretz Elad Cohen Lior Dikstein Oranit Dror I. Diamant Roy H. Jennings Arnon Netzer MQ 197 11 0 19 Sep 2021
An Embedding of ReLU Networks and an Analysis of their IdentifiabilityConstructive approximation (Constr. Approx.), 2021 Pierre Stock Rémi Gribonval 269 23 0 20 Jul 2021
A White Paper on Neural Network Quantization Markus Nagel Marios Fournarakis Rana Ali Amjad Yelysei Bondarenko M. V. Baalen Tijmen Blankevoort MQ 292 713 0 15 Jun 2021
RED : Looking for Redundancies for Data-Free Structured Compression of Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2021 Edouard Yvinec Arnaud Dapogny Matthieu Cord Kévin Bailly CVBM 120 29 0 31 May 2021
Low-Precision Hardware Architectures Meet Recommendation Model Inference at ScaleIEEE Micro (IEEE Micro), 2021 Zhaoxia Deng Deng Jongsoo Park P. T. P. Tang Haixin Liu ... S. Nadathur Changkyu Kim Maxim Naumov S. Naghshineh M. Smelyanskiy 139 12 0 26 May 2021
Do All MobileNets Quantize Poorly? Gaining Insights into the Effect of Quantization on Depthwise Separable Convolutional Networks Through the Eyes of Multi-scale Distributional Dynamics S. Yun Alexander Wong MQ 136 30 0 24 Apr 2021
Zero-shot Adversarial QuantizationComputer Vision and Pattern Recognition (CVPR), 2021 Yuang Liu Wei Zhang Jun Wang MQ 216 87 0 29 Mar 2021
Automated Backend-Aware Post-Training Quantization Ziheng Jiang Animesh Jain An Liu Josh Fromm Chengqian Ma Tianqi Chen Luis Ceze MQ 143 2 0 27 Mar 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inferenceFrontiers in Artificial Intelligence (Front. Artif. Intell.), 2021 B. Hawks Javier Mauricio Duarte Nicholas J. Fraser Alessandro Pappalardo N. Tran Yaman Umuroglu MQ 219 65 0 22 Feb 2021
Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis Shachar Gluska Mark Grobman MQ 93 7 0 15 Dec 2020
Layer-Wise Data-Free CNN CompressionInternational Conference on Pattern Recognition (ICPR), 2020 Maxwell Horton Yanzi Jin Ali Farhadi Mohammad Rastegari MQ 199 19 0 18 Nov 2020
One Weight Bitwidth to Rule Them All Ting-Wu Chin P. Chuang Vikas Chandra Diana Marculescu MQ 185 26 0 22 Aug 2020
Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors C. Coelho Aki Kuusela Shane Li Zhuang Hao T. Aarrestad Vladimir Loncar J. Ngadiuba M. Pierini Adrian Alan Pol S. Summers MQ 278 211 0 15 Jun 2020
Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming Itay Hubara Yury Nahshan Y. Hanani Ron Banner Daniel Soudry MQ 251 143 0 14 Jun 2020
A Data and Compute Efficient Design for Limited-Resources Deep Learning Mirgahney Mohamed Gabriele Cesa Taco S. Cohen Max Welling MedIm 150 20 0 21 Apr 2020
Post-training Quantization with Multiple Points: Mixed Precision without Mixed PrecisionAAAI Conference on Artificial Intelligence (AAAI), 2020 Xingchao Liu Mao Ye Dengyong Zhou Qiang Liu MQ 251 51 0 20 Feb 2020
$Gradient $\ell_1$ Regularization for Quantization Robustness$ Gradient $\ell_1$ Regularization for Quantization RobustnessInternational Conference on Learning Representations (ICLR), 2020 Milad Alizadeh Arash Behboodi M. V. Baalen Christos Louizos Tijmen Blankevoort Max Welling MQ 148 8 0 18 Feb 2020
Post-Training Piecewise Linear Quantization for Deep Neural NetworksEuropean Conference on Computer Vision (ECCV), 2020 Jun Fang Ali Shafiee Hamzah Abdel-Aziz D. Thorsley Georgios Georgiadis Joseph Hassoun MQ 369 170 0 31 Jan 2020
ZeroQ: A Novel Zero Shot Quantization FrameworkComputer Vision and Pattern Recognition (CVPR), 2020 Yaohui Cai Z. Yao Zhen Dong A. Gholami Michael W. Mahoney Kurt Keutzer MQ 257 453 0 01 Jan 2020
The Knowledge Within: Methods for Data-Free Model CompressionComputer Vision and Pattern Recognition (CVPR), 2019 Matan Haroush Itay Hubara Elad Hoffer Daniel Soudry 226 114 0 03 Dec 2019
OverQ: Opportunistic Outlier Quantization for Neural Network Accelerators Ritchie Zhao Jordan Dotzel Zhanqiu Hu Preslav Ivanov Christopher De Sa Zhiru Zhang MQ 149 1 0 13 Oct 2019
Data-Free Quantization Through Weight Equalization and Bias CorrectionIEEE International Conference on Computer Vision (ICCV), 2019 Markus Nagel M. V. Baalen Tijmen Blankevoort Max Welling MQ 311 583 0 11 Jun 2019