ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.08342
  4. Cited By
Quantizing deep convolutional networks for efficient inference: A
  whitepaper

Quantizing deep convolutional networks for efficient inference: A whitepaper

21 June 2018
Raghuraman Krishnamoorthi
    MQ
ArXivPDFHTML

Papers citing "Quantizing deep convolutional networks for efficient inference: A whitepaper"

50 / 464 papers shown
Title
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
26
1
0
12 Jul 2024
Learning Program Behavioral Models from Synthesized Input-Output Pairs
Learning Program Behavioral Models from Synthesized Input-Output Pairs
Tural Mammadov
Dietrich Klakow
Alexander Koller
Andreas Zeller
45
3
0
11 Jul 2024
Integer-only Quantized Transformers for Embedded FPGA-based Time-series
  Forecasting in AIoT
Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT
Tianheng Ling
Chao Qian
Gregor Schiele
AI4TS
MQ
19
1
0
06 Jul 2024
ISQuant: apply squant to the real deployment
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
19
0
0
05 Jul 2024
Unified Anomaly Detection methods on Edge Device using Knowledge
  Distillation and Quantization
Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization
Sushovan Jena
Arya Pulkit
Kajal Singh
Anoushka Banerjee
Sharad Joshi
Ananth Ganesh
Dinesh Singh
Arnav Bhavsar
37
1
0
03 Jul 2024
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization
  for Vision Transformers
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
Yanfeng Jiang
Ning Sun
Xueshuo Xie
Fei Yang
Tao Li
MQ
36
2
0
03 Jul 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate
  Each Other
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
36
3
0
24 Jun 2024
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal
  Quantization levels and Rank Values trough Differentiable Bayesian Gates
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
Cristian Meo
Ksenia Sycheva
Anirudh Goyal
Justin Dauwels
MQ
24
4
0
18 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
36
11
0
12 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
10
0
10 Jun 2024
Efficient Neural Compression with Inference-time Decoding
Efficient Neural Compression with Inference-time Decoding
Clément Metz
Olivier Bichler
Antoine Dupret
MQ
27
0
0
10 Jun 2024
Subspace Node Pruning
Subspace Node Pruning
Joshua Offergeld
Marcel van Gerven
Nasir Ahmad
38
0
0
26 May 2024
Properties that allow or prohibit transferability of adversarial attacks
  among quantized networks
Properties that allow or prohibit transferability of adversarial attacks among quantized networks
Abhishek Shrestha
Jürgen Grossmann
AAML
26
0
0
15 May 2024
Structure-Preserving Network Compression Via Low-Rank Induced Training
  Through Linear Layers Composition
Structure-Preserving Network Compression Via Low-Rank Induced Training Through Linear Layers Composition
Xitong Zhang
Ismail R. Alkhouri
Rongrong Wang
35
0
0
06 May 2024
Training-free Graph Neural Networks and the Power of Labels as Features
Training-free Graph Neural Networks and the Power of Labels as Features
Ryoma Sato
31
4
0
30 Apr 2024
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision
Yufeng Yang
Adrian Kneip
Charlotte Frenkel
GNN
36
4
0
30 Apr 2024
How to Parameterize Asymmetric Quantization Ranges for
  Quantization-Aware Training
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
41
1
0
25 Apr 2024
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via
  decoupling Parameters into Integer and Floating Points
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points
Yi Guo
Fanliu Kong
Xiaoyang Li
Hui Li
Wei-Neng Chen
Xiaogang Tian
Jinping Cai
Yang Zhang
Shouda Liu
MQ
26
6
0
19 Apr 2024
QGen: On the Ability to Generalize in Quantization Aware Training
QGen: On the Ability to Generalize in Quantization Aware Training
Mohammadhossein Askarihemmat
Ahmadreza Jeddi
Reyhane Askari Hemmat
Ivan Lazarevich
Alexander Hoffman
Sudhakar Sah
Ehsan Saboori
Yvon Savaria
Jean-Pierre David
MQ
21
0
0
17 Apr 2024
Comprehensive Survey of Model Compression and Speed up for Vision
  Transformers
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Feiyang Chen
Ziqian Luo
Lisang Zhou
Xueting Pan
Ying Jiang
16
22
0
16 Apr 2024
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based
  Protocol and Quantization Co-Optimization
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization
Wenxuan Zeng
Tianshi Xu
Meng Li
Runsheng Wang
MQ
32
0
0
15 Apr 2024
Differentiable Search for Finding Optimal Quantization Strategy
Differentiable Search for Finding Optimal Quantization Strategy
Lianqiang Li
Chenqian Yan
Yefei Chen
MQ
21
2
0
10 Apr 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large
  Language Models
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
34
2
0
03 Apr 2024
DNN Memory Footprint Reduction via Post-Training Intra-Layer
  Multi-Precision Quantization
DNN Memory Footprint Reduction via Post-Training Intra-Layer Multi-Precision Quantization
B. Ghavami
Amin Kamjoo
Lesley Shannon
S. Wilton
MQ
14
0
0
03 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural
  Networks
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
42
0
0
29 Mar 2024
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal
  Propagation Analysis for Large Language Models
Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language Models
Kartikeya Bhardwaj
N. Pandey
Sweta Priyadarshi
Kyunggeun Lee
Jun Ma
Harris Teague
MQ
40
2
0
26 Mar 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
32
3
0
25 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
46
1
0
22 Mar 2024
Self-Supervised Quantization-Aware Knowledge Distillation
Self-Supervised Quantization-Aware Knowledge Distillation
Kaiqi Zhao
Ming Zhao
MQ
33
2
0
17 Mar 2024
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven
  Fine Tuning
QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Jiun-Man Chen
Yu-Hsuan Chao
Yu-Jie Wang
Ming-Der Shieh
Chih-Chung Hsu
Wei-Fen Lin
MQ
24
1
0
11 Mar 2024
Self-Adapting Large Visual-Language Models to Edge Devices across Visual
  Modalities
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai
Zhekai Duan
Gaowen Liu
Charles Fleming
Chris Xiaoxuan Lu
VLM
28
3
0
07 Mar 2024
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with
  Linear Quantization
FlowPrecision: Advancing FPGA-Based Real-Time Fluid Flow Estimation with Linear Quantization
Tianheng Ling
Julian Hoever
Chao Qian
Gregor Schiele
MQ
16
5
0
04 Mar 2024
BasedAI: A decentralized P2P network for Zero Knowledge Large Language
  Models (ZK-LLMs)
BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs)
Sean Wellington
21
5
0
01 Mar 2024
Ef-QuantFace: Streamlined Face Recognition with Small Data and Low-Bit
  Precision
Ef-QuantFace: Streamlined Face Recognition with Small Data and Low-Bit Precision
William Gazali
Jocelyn Michelle Kho
Joshua Santoso
Williem
CVBM
MQ
42
0
0
28 Feb 2024
Evaluating Quantized Large Language Models
Evaluating Quantized Large Language Models
Shiyao Li
Xuefei Ning
Luning Wang
Tengxuan Liu
Xiangsheng Shi
Shengen Yan
Guohao Dai
Huazhong Yang
Yu-Xiang Wang
MQ
43
43
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
28
1
0
27 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
29
1
0
19 Feb 2024
Compression Repair for Feedforward Neural Networks Based on Model
  Equivalence Evaluation
Compression Repair for Feedforward Neural Networks Based on Model Equivalence Evaluation
Zihao Mo
Yejiang Yang
Shuaizheng Lu
Weiming Xiang
32
1
0
18 Feb 2024
End-to-End Training Induces Information Bottleneck through Layer-Role
  Differentiation: A Comparative Analysis with Layer-wise Training
End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training
Keitaro Sakamoto
Issei Sato
19
4
0
14 Feb 2024
TransAxx: Efficient Transformers with Approximate Computing
TransAxx: Efficient Transformers with Approximate Computing
Dimitrios Danopoulos
Georgios Zervakis
Dimitrios Soudris
Jörg Henkel
ViT
42
2
0
12 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
37
7
0
08 Feb 2024
HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible
  Dataflows for Energy-Efficient CNN Inference
HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN Inference
Sairam Sri Vatsavai
Venkata Sai Praneeth Karempudi
Ishan G. Thakkar
19
0
0
05 Feb 2024
HEQuant: Marrying Homomorphic Encryption and Quantization for
  Communication-Efficient Private Inference
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Tianshi Xu
Meng Li
Runsheng Wang
37
0
0
29 Jan 2024
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object
  Detection
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
Sifan Zhou
Liang Li
Xinyu Zhang
Bo-Wen Zhang
Shipeng Bai
Miao Sun
Ziyu Zhao
Xiaobo Lu
Xiangxiang Chu
MQ
36
12
0
29 Jan 2024
Scaling Up Quantization-Aware Neural Architecture Search for Efficient
  Deep Learning on the Edge
Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
Yao Lu
Hiram Rayo Torres Rodriguez
Sebastian Vogel
Nick Van De Waterlaat
P. Jancura
MQ
30
1
0
22 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence
  Inference
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
17
3
0
19 Jan 2024
Model Compression Techniques in Biometrics Applications: A Survey
Model Compression Techniques in Biometrics Applications: A Survey
Eduarda Caldeira
Pedro C. Neto
Marco Huber
Naser Damer
Ana F. Sequeira
34
11
0
18 Jan 2024
Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem
  Translation
Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation
Triet Minh Huynh
Quan Le Bao
16
1
0
02 Jan 2024
Previous
12345...8910
Next