Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.09602
Cited By
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
20 April 2020
Hao Wu
Patrick Judd
Xiaojie Zhang
Mikhail Isaev
Paulius Micikevicius
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation"
45 / 45 papers shown
Title
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model
Navin Ranjan
Andreas E. Savakis
MQ
VLM
63
0
0
08 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
28
0
0
21 Apr 2025
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
Yingsong Luo
Ling Chen
MQ
21
0
0
16 Oct 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
49
0
0
28 Jul 2024
Quantizing YOLOv7: A Comprehensive Study
Mohammadamin Baghbanbashi
Mohsen Raji
B. Ghavami
MQ
27
8
0
06 Jul 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
36
7
0
28 May 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
32
3
0
25 Mar 2024
Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment
Atah Nuh Mih
Alireza Rahimi
Asfia Kawnine
Francis Palma
Monica Wachowicz
R. Dubay
Hung Cao
21
0
0
14 Mar 2024
A Plug-in Tiny AI Module for Intelligent and Selective Sensor Data Transmission
Wenjun Huang
Arghavan Rezvani
Hanning Chen
Yang Ni
Sanggeon Yun
Sungheon Jeong
Mohsen Imani
19
7
0
03 Feb 2024
Knowledge Translation: A New Pathway for Model Compression
Wujie Sun
Defang Chen
Jiawei Chen
Yan Feng
Chun-Yen Chen
Can Wang
25
0
0
11 Jan 2024
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Jiayi Pan
Chengcan Wang
Kaifu Zheng
Yangguang Li
Zhenyu Wang
Bin Feng
MQ
35
7
0
06 Dec 2023
Λ
Λ
Λ
-Split: A Privacy-Preserving Split Computing Framework for Cloud-Powered Generative AI
Shoki Ohta
Takayuki Nishio
62
4
0
23 Oct 2023
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Lakshmi Nair
Mikhail Bernadskiy
Arulselvan Madhavan
Craig Chan
Ayon Basumallik
D. Bunandar
MQ
28
2
0
07 Jul 2023
A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact
Álvaro Huertas-García
Carlos Martí-González
Rubén García Maezo
Alejandro Echeverría Rey
22
3
0
01 Jul 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
25
23
0
18 May 2023
Mathematical Challenges in Deep Learning
V. Nia
Guojun Zhang
I. Kobyzev
Michael R. Metel
Xinlin Li
...
S. Hemati
M. Asgharian
Linglong Kong
Wulong Liu
Boxing Chen
AI4CE
VLM
37
1
0
24 Mar 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
16
0
0
03 Mar 2023
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
21
6
0
05 Dec 2022
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
27
5
0
27 Oct 2022
TPU-MLIR: A Compiler For TPU Using MLIR
Pengchao Hu
Man Lu
Lei Wang
Guoyue Jiang
14
5
0
23 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
22
145
0
27 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
74
31
0
14 Sep 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
69
121
0
12 Sep 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
18
11
0
11 Aug 2022
Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park
Yeongsang Jang
Eunhyeok Park
MQ
14
1
0
31 Jul 2022
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
Benjamin Fuhrer
Yuval Shpigelman
Chen Tessler
Shie Mannor
Gal Chechik
E. Zahavi
Gal Dalal
25
4
0
05 Jul 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
48
95
0
04 Jul 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
12
6
0
22 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
45
441
0
04 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
40
9
0
22 May 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
9
6
0
12 May 2022
Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware
B. Sudharsan
Dineshkumar Sundaram
Pankesh Patel
J. Breslin
M. Ali
Schahram Dustdar
Albert Zomaya
R. Ranjan
13
2
0
20 Apr 2022
ICSML: Industrial Control Systems ML Framework for native inference using IEC 61131-3 code
Constantine Doumanidis
Prashant Hari Narayan Rajput
Michail Maniatakos
12
2
0
21 Feb 2022
Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment
Jemin Lee
Misun Yu
Yongin Kwon
Teaho Kim
MQ
17
17
0
10 Feb 2022
Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations
Xinyu Zhang
Ian Colbert
Ken Kreutz-Delgado
Srinjoy Das
MQ
29
11
0
15 Oct 2021
Shifting Capsule Networks from the Cloud to the Deep Edge
Miguel Costa
Diogo Costa
T. Gomes
Sandro Pinto
16
5
0
06 Oct 2021
4-bit Quantization of LSTM-based Speech Recognition Models
A. Fasoli
Chia-Yu Chen
Mauricio Serrano
Xiao Sun
Naigang Wang
...
Xiaodong Cui
Brian Kingsbury
Wei Zhang
Zoltán Tüske
K. Gopalakrishnan
MQ
23
21
0
27 Aug 2021
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
33
20
0
30 Jun 2021
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Jiawei Zhao
Steve Dai
Rangharajan Venkatesan
Brian Zimmer
Mustafa Ali
Ming-Yu Liu
Brucek Khailany
B. Dally
Anima Anandkumar
MQ
31
13
0
26 Jun 2021
Knowledge distillation: A good teacher is patient and consistent
Lucas Beyer
Xiaohua Zhai
Amelie Royer
L. Markeeva
Rohan Anil
Alexander Kolesnikov
VLM
35
287
0
09 Jun 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
25
67
0
08 Feb 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang
C. Glossner
Lei Wang
Shaobo Shi
Xiaotong Zhang
MQ
127
673
0
24 Jan 2021
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
29
79
0
17 Sep 2020
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,216
0
16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
1