Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.08342
Cited By
Quantizing deep convolutional networks for efficient inference: A whitepaper
21 June 2018
Raghuraman Krishnamoorthi
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantizing deep convolutional networks for efficient inference: A whitepaper"
50 / 464 papers shown
Title
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting
Dawei Yang
Ning He
Xing Hu
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
MQ
25
5
0
17 Dec 2023
A 1.6-mW Sparse Deep Learning Accelerator for Speech Separation
Chih-Chyau Yang
Tian-Sheuan Chang
26
0
0
15 Dec 2023
RdimKD: Generic Distillation Paradigm by Dimensionality Reduction
Yi Guo
Yiqian He
Xiaoyang Li
Haotong Qin
Van Tung Pham
Yang Zhang
Shouda Liu
43
1
0
14 Dec 2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Xiaoxia Wu
Haojun Xia
Stephen Youn
Zhen Zheng
Shiyang Chen
...
Reza Yazdani Aminabadi
Yuxiong He
Olatunji Ruwase
Leon Song
Zhewei Yao
66
8
0
14 Dec 2023
SqueezeSAM: User friendly mobile interactive segmentation
Bala Varadarajan
Bilge Soran
Forrest N. Iandola
Xiaoyu Xiang
Yunyang Xiong
Lemeng Wu
Chenchen Zhu
Raghuraman Krishnamoorthi
Vikas Chandra
VLM
24
2
0
11 Dec 2023
Efficient Quantization Strategies for Latent Diffusion Models
Yuewei Yang
Xiaoliang Dai
Jialiang Wang
Peizhao Zhang
Hongbo Zhang
DiffM
MQ
24
13
0
09 Dec 2023
RACE-IT: A Reconfigurable Analog CAM-Crossbar Engine for In-Memory Transformer Acceleration
Lei Zhao
Luca Buonanno
Ron M. Roth
Sergey Serebryakov
Archit Gajjar
John Moon
Jim Ignowski
Giacomo Pedretti
25
3
0
29 Nov 2023
Mirage: An RNS-Based Photonic Accelerator for DNN Training
Cansu Demirkıran
Guowei Yang
D. Bunandar
Ajay Joshi
26
1
0
29 Nov 2023
LayerCollapse: Adaptive compression of neural networks
Soheil Zibakhsh Shabgahi
Mohammad Soheil Shariff
F. Koushanfar
AI4CE
16
1
0
29 Nov 2023
PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
15
0
0
27 Nov 2023
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms
Young D. Kwon
Jagmohan Chauhan
Hong Jia
Stylianos I. Venieris
Cecilia Mascolo
38
11
0
19 Nov 2023
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Yunshan Zhong
Jiawei Hu
Mingbao Lin
Mengzhao Chen
Rongrong Ji
MQ
28
3
0
16 Nov 2023
Reducing the Side-Effects of Oscillations in Training of Quantized YOLO Networks
Kartik Gupta
Akshay Asthana
MQ
24
8
0
09 Nov 2023
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang
Jiacheng Liao
Fanding Lei
Meichen Liu
Junyi Chen
Lingkun Long
Han Wan
Bei Yu
Weisheng Zhao
MoE
33
2
0
03 Nov 2023
Exploring Post-Training Quantization of Protein Language Models
Shuang Peng
Fei Yang
Ning Sun
Sheng Chen
Yanfeng Jiang
Aimin Pan
MQ
19
0
0
30 Oct 2023
Resource Constrained Semantic Segmentation for Waste Sorting
Elisa Cascina
Andrea Pellegrino
Lorenzo Tozzi
9
1
0
30 Oct 2023
Effortless Cross-Platform Video Codec: A Codebook-Based Method
Kuan Tian
Yonghang Guan
Jin-Peng Xiang
Jun Zhang
Xiao Han
Wei Yang
32
1
0
16 Oct 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
G. Constantinides
Yiren Zhao
MQ
19
9
0
08 Oct 2023
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
Yefei He
Jing Liu
Weijia Wu
Hong Zhou
Bohan Zhuang
DiffM
MQ
16
46
0
05 Oct 2023
Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication
Zhe Zhao
Qingyun Liu
Huan Gui
Bang An
Lichan Hong
Ed H. Chi
20
1
0
04 Oct 2023
A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs
Tianheng Ling
Chao Qian
Lukas Einhaus
Gregor Schiele
11
1
0
04 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers
Rickard Brannvall
19
0
0
03 Oct 2023
Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings
Vittorio Giammarino
Arnaud Dapogny
Kévin Bailly
MQ
22
1
0
29 Sep 2023
Highly Efficient SNNs for High-speed Object Detection
Nemin Qiu
Zhiguo Li
Yuan Li
Chuang Zhu
22
0
0
27 Sep 2023
Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Selim F. Yilmaz
Benjamin Ghaemmaghami
A. Singh
Benjamin Cho
Leo Orshansky
Lei Deng
Michael Orshansky
AI4TS
23
0
0
27 Sep 2023
Efficient Post-training Quantization with FP8 Formats
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
23
19
0
26 Sep 2023
A Machine Learning-oriented Survey on Tiny Machine Learning
Luigi Capogrosso
Federico Cunico
D. Cheng
Franco Fummi
Marco Cristani
SyDa
MU
29
33
0
21 Sep 2023
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information
Kuan Tian
Yonghang Guan
Jin-Peng Xiang
Jun Zhang
Xiao Han
Wei Yang
32
7
0
20 Sep 2023
SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization
Jinjie Zhang
Rayan Saab
19
0
0
20 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote Sensing
Clifford Broni-Bediako
Junshi Xia
Naoto Yokoya
38
9
0
12 Sep 2023
EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection
Andrej Jovanović
Mario Mihaly
Lennon Donaldson
36
0
0
11 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
26
0
0
05 Sep 2023
Softmax Bias Correction for Quantized Generative Models
N. Pandey
Marios Fournarakis
Chirag I. Patel
Markus Nagel
DiffM
17
11
0
04 Sep 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
R. Ji
VLM
24
7
0
04 Sep 2023
Federated Learning in IoT: a Survey from a Resource-Constrained Perspective
Ishmeet Kaur
30
2
0
25 Aug 2023
ResQ: Residual Quantization for Video Perception
Davide Abati
H. Yahia
Markus Nagel
A. Habibian
MQ
21
2
0
18 Aug 2023
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
MQ
24
6
0
10 Aug 2023
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Edouard Yvinec
Arnaud Dapogny
Kévin Bailly
Xavier Fischer
AAML
6
2
0
09 Aug 2023
EFaR 2023: Efficient Face Recognition Competition
J. Kolf
Fadi Boutros
Jurek Elliesen
Markus Theuerkauf
Naser Damer
...
D. Nunes
Ahmad Hassanpour
Pankaj Khatiwada
A. Toor
Bian Yang
CVBM
MQ
24
13
0
08 Aug 2023
Survey on Computer Vision Techniques for Internet-of-Things Devices
Ishmeet Kaur
Adwaita Janardhan Jadhav
AI4CE
14
1
0
02 Aug 2023
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Stylianos I. Venieris
Javier Fernandez-Marques
Nicholas D. Lane
MQ
16
3
0
25 Jul 2023
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Peijie Dong
Lujun Li
Zimian Wei
Xin-Yi Niu
Zhiliang Tian
H. Pan
MQ
45
28
0
20 Jul 2023
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
Vasileios Leon
Muhammad Abdullah Hanif
Giorgos Armeniakos
Xun Jiao
Muhammad Shafique
K. Pekmestzi
Dimitrios Soudris
29
3
0
20 Jul 2023
TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge
Young D. Kwon
Rui Li
Stylianos I. Venieris
Jagmohan Chauhan
Nicholas D. Lane
Cecilia Mascolo
19
8
0
19 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
40
62
0
16 Jul 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
32
1
0
12 Jul 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
16
5
0
10 Jul 2023
ECG-Image-Kit: A Synthetic Image Generation Toolbox to Facilitate Deep Learning-Based Electrocardiogram Digitization
Kshama Kodthalu Shivashankara
Deepanshi
Afagh Mehri Shervedani
Gari D. Clifford
Matthew A. Reyna
Reza Sameni
MedIm
17
39
0
04 Jul 2023
DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference
Bahareh Khabbazan
Marc Riera
Antonio González
MQ
13
3
0
28 Jun 2023
A Survey on Graph Neural Network Acceleration: Algorithms, Systems, and Customized Hardware
Shichang Zhang
Atefeh Sohrabizadeh
Cheng Wan
Zijie Huang
Ziniu Hu
Yewen Wang
Yingyan Lin
Lin
Jason Cong
Yizhou Sun
GNN
AI4CE
29
22
0
24 Jun 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next