ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.08342
  4. Cited By
Quantizing deep convolutional networks for efficient inference: A
  whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

21 June 2018
Raghuraman Krishnamoorthi
    MQ
ArXivPDFHTML

Papers citing "Quantizing deep convolutional networks for efficient inference: A whitepaper"

50 / 464 papers shown
Title
Efficient Online Processing with Deep Neural Networks
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
18
0
0
23 Jun 2023
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
Explainable Lifelong Stream Learning Based on "Glocal" Pairwise Fusion
C. K. Loo
W. S. Liew
S. Wermter
CLL
11
0
0
23 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
13
88
0
22 Jun 2023
Intriguing Properties of Quantization at Scale
Intriguing Properties of Quantization at Scale
Arash Ahmadian
Saurabh Dash
Hongyu Chen
Bharat Venkitesh
Stephen Gou
Phil Blunsom
A. Ustun
Sara Hooker
MQ
46
38
0
30 May 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
46
187
0
29 May 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
25
23
0
18 May 2023
MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking
  Neural Networks
MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking Neural Networks
Ruokai Yin
Yuhang Li
Abhishek Moitra
Priyadarshini Panda
MQ
16
10
0
16 May 2023
Analyzing Compression Techniques for Computer Vision
Analyzing Compression Techniques for Computer Vision
Maniratnam Mandal
Imran Khan
24
1
0
14 May 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
30
5
0
27 Apr 2023
Deep Convolutional Tables: Deep Learning without Convolutions
Deep Convolutional Tables: Deep Learning without Convolutions
S. Dekel
Y. Keller
Aharon Bar-Hillel
3DV
22
0
0
23 Apr 2023
QuMoS: A Framework for Preserving Security of Quantum Machine Learning
  Model
QuMoS: A Framework for Preserving Security of Quantum Machine Learning Model
Zhepeng Wang
Jinyang Li
Zhirui Hu
Blake Gage
Elizabeth Iwasawa
Weiwen Jiang
25
9
0
23 Apr 2023
Effective Neural Network $L_0$ Regularization With BinMask
Effective Neural Network L0L_0L0​ Regularization With BinMask
Kai Jia
Martin Rinard
26
3
0
21 Apr 2023
Improving Post-Training Quantization on Object Detection with Task
  Loss-Guided Lp Metric
Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric
Lin Niu
Jia-Wen Liu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
33
2
0
19 Apr 2023
EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural
  Network Inference considering Approximate DRAMs for Embedded Systems
EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems
Rachmad Vidya Wicaksana Putra
Muhammad Abdullah Hanif
Muhammad Shafique
24
11
0
08 Apr 2023
HNeRV: A Hybrid Neural Representation for Videos
HNeRV: A Hybrid Neural Representation for Videos
Hao Chen
M. Gwilliam
Ser-Nam Lim
Abhinav Shrivastava
14
69
1
05 Apr 2023
A Unified Compression Framework for Efficient Speech-Driven Talking-Face
  Generation
A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation
Bo-Kyeong Kim
Jaemin Kang
Daeun Seo
Hancheol Park
Shinkook Choi
Hyoung-Kyu Song
Hyungshin Kim
Sungsu Lim
19
0
0
02 Apr 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
23
44
0
31 Mar 2023
Architecturing Binarized Neural Networks for Traffic Sign Recognition
Architecturing Binarized Neural Networks for Traffic Sign Recognition
Andreea Postovan
Madalina Erascu
25
4
0
27 Mar 2023
Benchmarking the Reliability of Post-training Quantization: a Particular
  Focus on Worst-case Performance
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance
Zhihang Yuan
Jiawei Liu
Jiaxiang Wu
Dawei Yang
Qiang Wu
Guangyu Sun
Wenyu Liu
Xinggang Wang
Bingzhe Wu
MQ
14
6
0
23 Mar 2023
Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with
  Bridge Block Reconstruction for IoT Systems
Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems
Jemin Lee
Yongin Kwon
Sihyeong Park
Misun Yu
Jeman Park
Hwanjun Song
ViT
MQ
14
5
0
22 Mar 2023
Solving Oscillation Problem in Post-Training Quantization Through a
  Theoretical Perspective
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
Yuexiao Ma
Huixia Li
Xiawu Zheng
Xuefeng Xiao
Rui Wang
Shilei Wen
Xin Pan
Fei Chao
Rongrong Ji
MQ
10
12
0
21 Mar 2023
Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing
Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing
Nived Rajaraman
Devvrit
Aryan Mokhtari
Kannan Ramchandran
20
0
0
20 Mar 2023
Rotation Invariant Quantization for Model Compression
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
16
0
0
03 Mar 2023
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit
  Communication
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication
Mengdie Zhang
Qi Hu
Peng Sun
Yonggang Wen
Tianwei Zhang
GNN
32
4
0
02 Mar 2023
DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural
  Network Inference
DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference
Jiajun Zhou
Jiajun Wu
Yizhao Gao
Yuhao Ding
Chaofan Tao
Bo-wen Li
Fengbin Tu
Kwang-Ting Cheng
Hayden Kwok-Hay So
Ngai Wong
MQ
16
7
0
24 Feb 2023
Fixflow: A Framework to Evaluate Fixed-point Arithmetic in Light-Weight
  CNN Inference
Fixflow: A Framework to Evaluate Fixed-point Arithmetic in Light-Weight CNN Inference
Farhad Taheri
Siavash Bayat Sarmadi
H. Mosanaei-Boorani
Reza Taheri
MQ
18
1
0
19 Feb 2023
Deep Learning for Event-based Vision: A Comprehensive Survey and
  Benchmarks
Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks
Xueye Zheng
Yexin Liu
Yunfan Lu
Tongyan Hua
Tianbo Pan
Weiming Zhang
Dacheng Tao
Lin Wang
AI4TS
BDL
3DV
39
80
0
17 Feb 2023
Hardware-aware training for large-scale and diverse deep learning
  inference workloads using in-memory computing-based accelerators
Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators
M. Rasch
C. Mackin
M. Le Gallo
An Chen
A. Fasoli
...
P. Narayanan
H. Tsai
G. Burr
A. Sebastian
V. Narayanan
13
83
0
16 Feb 2023
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast,
  Energy-Efficient Inference of Integer-Quantized CNNs
SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs
Sairam Sri Vatsavai
Venkata Sai Praneeth Karempudi
Ishan G. Thakkar
S. A. Salehi
J. Hastings
21
8
0
14 Feb 2023
A Practical Mixed Precision Algorithm for Post-Training Quantization
A Practical Mixed Precision Algorithm for Post-Training Quantization
N. Pandey
Markus Nagel
M. V. Baalen
Yin-Ruey Huang
Chirag I. Patel
Tijmen Blankevoort
MQ
9
18
0
10 Feb 2023
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement
  Learning
Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement Learning
Yingchun Wang
Jingcai Guo
Song Guo
Weizhan Zhang
MQ
29
20
0
09 Feb 2023
CRAFT: Criticality-Aware Fault-Tolerance Enhancement Techniques for
  Emerging Memories-Based Deep Neural Networks
CRAFT: Criticality-Aware Fault-Tolerance Enhancement Techniques for Emerging Memories-Based Deep Neural Networks
Thai-Hoang Nguyen
Muhammad Imran
Jaehyuk Choi
Joongseob Yang
17
3
0
08 Feb 2023
DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a
  Multi-Tasking System
DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System
Minkyoung Cho
K. Shin
27
2
0
03 Feb 2023
PowerQuant: Automorphism Search for Non-Uniform Quantization
PowerQuant: Automorphism Search for Non-Uniform Quantization
Edouard Yvinec
Arnaud Dapogny
Matthieu Cord
Kévin Bailly
MQ
17
15
0
24 Jan 2023
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
  Quantized CNNs
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs
A. M. Ribeiro-dos-Santos
João Dinis Ferreira
O. Mutlu
G. Falcão
MQ
15
1
0
15 Jan 2023
Mantis: Enabling Energy-Efficient Autonomous Mobile Agents with Spiking
  Neural Networks
Mantis: Enabling Energy-Efficient Autonomous Mobile Agents with Spiking Neural Networks
Rachmad Vidya Wicaksana Putra
Muhammad Shafique
37
6
0
24 Dec 2022
Training Integer-Only Deep Recurrent Neural Networks
Training Integer-Only Deep Recurrent Neural Networks
V. Nia
Eyyub Sari
Vanessa Courville
M. Asgharian
MQ
45
2
0
22 Dec 2022
Redistribution of Weights and Activations for AdderNet Quantization
Redistribution of Weights and Activations for AdderNet Quantization
Ying Nie
Kai Han
Haikang Diao
Chuanjian Liu
Enhua Wu
Yunhe Wang
MQ
49
6
0
20 Dec 2022
The case for 4-bit precision: k-bit Inference Scaling Laws
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
19
214
0
19 Dec 2022
Atrous Space Bender U-Net (ASBU-Net/LogiNet)
Atrous Space Bender U-Net (ASBU-Net/LogiNet)
Anurag Bansal
O. Ostap
Miguel Maestre Trueba
Kristopher Perry
SSeg
11
0
0
16 Dec 2022
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
  Vision Transformers
RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers
Zhikai Li
Junrui Xiao
Lianwei Yang
Qingyi Gu
MQ
26
81
0
16 Dec 2022
PD-Quant: Post-Training Quantization based on Prediction Difference
  Metric
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
96
68
0
14 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees
  of freedom
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
21
6
0
05 Dec 2022
Make RepVGG Greater Again: A Quantization-aware Approach
Make RepVGG Greater Again: A Quantization-aware Approach
Xiangxiang Chu
Liang Li
Bo-Wen Zhang
MQ
31
46
0
03 Dec 2022
Quadapter: Adapter for GPT-2 Quantization
Quadapter: Adapter for GPT-2 Quantization
Minseop Park
J. You
Markus Nagel
Simyung Chang
MQ
21
9
0
30 Nov 2022
Compressing Volumetric Radiance Fields to 1 MB
Compressing Volumetric Radiance Fields to 1 MB
Lingzhi Li
Zhen Shen
Zhongshu Wang
Li Shen
Liefeng Bo
25
64
0
29 Nov 2022
Quantization-aware Interval Bound Propagation for Training Certifiably
  Robust Quantized Neural Networks
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks
Mathias Lechner
Dorde Zikelic
K. Chatterjee
T. Henzinger
Daniela Rus
AAML
16
2
0
29 Nov 2022
Zero-Shot Dynamic Quantization for Transformer Inference
Zero-Shot Dynamic Quantization for Transformer Inference
Yousef El-Kurdi
Jerry Quinn
Avirup Sil
MQ
14
1
0
17 Nov 2022
Partial Binarization of Neural Networks for Budget-Aware Efficient
  Learning
Partial Binarization of Neural Networks for Budget-Aware Efficient Learning
Udbhav Bamba
Neeraj Anand
Saksham Aggarwal
Dilip K Prasad
D. K. Gupta
MQ
17
0
0
12 Nov 2022
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
  Training
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Yunshan Zhong
Gongrui Nan
Yu-xin Zhang
Fei Chao
Rongrong Ji
MQ
18
3
0
12 Nov 2022
Previous
12345...8910
Next