Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.07093
Cited By
v1
v2
v3 (latest)
Bayesian Bits: Unifying Quantization and Pruning
14 May 2020
M. V. Baalen
Christos Louizos
Markus Nagel
Rana Ali Amjad
Ying Wang
Tijmen Blankevoort
Max Welling
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bayesian Bits: Unifying Quantization and Pruning"
50 / 79 papers shown
MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
Zijun Jiang
Yangdi Lyu
MQ
238
0
0
13 Aug 2025
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
H. Lee
Myungjun Son
Dongjea Kang
Seung-Won Jung
DiffM
MQ
352
2
0
14 Jul 2025
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Chen Feng
Yicheng Lin
Shaojie Zhuo
Chenzheng Su
R. Ramakrishnan
Zhaocong Yuan
Xiaopeng Zhang
MQ
294
4
0
10 Jul 2025
Efficient Mixed Precision Quantization in Graph Neural Networks
IEEE International Conference on Data Engineering (ICDE), 2025
Samir Moustafa
Nils M. Kriege
Wilfried Gansterer
GNN
MQ
389
3
0
14 May 2025
Onboard Optimization and Learning: A Survey
Monirul Islam Pavel
Siyi Hu
Mahardhika Pratama
Ryszard Kowalczyk
437
2
0
07 May 2025
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
499
4
0
21 Mar 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Computer Vision and Pattern Recognition (CVPR), 2025
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
389
11
0
23 Feb 2025
BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration
M. Rakka
Rachid Karami
A. Eltawil
M. Fouda
Fadi J. Kurdahi
MQ
307
3
0
03 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
482
1
0
01 Nov 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
347
0
0
30 Sep 2024
Accumulator-Aware Post-Training Quantization for Large Language Models
Ian Colbert
Giuseppe Franco
Fabian Grob
Jinjie Zhang
Rayan Saab
MQ
372
4
0
25 Sep 2024
Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues
AAAI Conference on Artificial Intelligence (AAAI), 2024
Mingshen Wang
Zhao Zhang
Feng Li
Ke Xu
Kang Miao
Meng Wang
MQ
SupR
277
4
0
22 Sep 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
Tianyi Chen
Xiaoyi Qu
David Aponte
Colby R. Banbury
Jongwoo Ko
Tianyu Ding
Yong Ma
Vladimir Lyapunov
Ilya Zharkov
Luming Liang
564
3
0
11 Sep 2024
A Mean Field Ansatz for Zero-Shot Weight Transfer
Xingyuan Chen
Wenwei Kuang
Lei Deng
Wei Han
Bo Bai
Goncalo dos Reis
197
1
0
16 Aug 2024
Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
Mingliang Xu
Jiawei Hu
You Huang
Yuxin Zhang
Rongrong Ji
MQ
123
1
0
09 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
347
12
0
01 Jul 2024
Improving the performance of Stein variational inference through extreme sparsification of physically-constrained neural network models
G. A. Padmanabha
J. Fuhg
Cosmin Safta
Reese E. Jones
N. Bouklas
383
12
0
30 Jun 2024
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
Cristian Meo
Ksenia Sycheva
Anirudh Goyal
Justin Dauwels
MQ
360
12
0
18 Jun 2024
Quantization of Large Language Models with an Overdetermined Basis
D. Merkulov
Daria Cherniuk
Alexander Rudikov
Ivan Oseledets
Ekaterina Muravleva
A. Mikhalev
Boris Kashin
MQ
241
2
0
15 Apr 2024
Bayesian Federated Model Compression for Communication and Computation Efficiency
Cheng-Gang Xia
Danny H. K. Tsang
Vincent K. N. Lau
281
1
0
11 Apr 2024
RefQSR: Reference-based Quantization for Image Super-Resolution Networks
IEEE Transactions on Image Processing (TIP), 2024
H. Lee
Jun-Sang Yoo
Seung-Won Jung
SupR
276
11
0
02 Apr 2024
Mixed-precision Supernet Training from Vision Foundation Models using Low Rank Adapter
Yuiko Sakuma
Masakazu Yoshimura
Junji Otsuka
Atsushi Irie
Takeshi Ohashi
MQ
319
0
0
29 Mar 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
196
4
0
27 Feb 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
281
17
0
03 Jan 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
249
29
0
23 Dec 2023
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting
Dawei Yang
Ning He
Yan Chen
Zhihang Yuan
Jiangyong Yu
Chen Xu
Zhe Jiang
MQ
283
15
0
17 Dec 2023
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Tianyi Chen
Tianyu Ding
Zhihui Zhu
Zeyu Chen
HsiangTao Wu
Ilya Zharkov
Luming Liang
259
7
0
15 Dec 2023
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
M. Lê
Pierre Wolinski
Julyan Arbel
348
21
0
20 Nov 2023
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
Sungho Jeon
Ching-Feng Yeh
Hakan Inan
Wei-Ning Hsu
Rashi Rungta
Yashar Mehdad
Daniel M. Bikel
213
0
0
05 Nov 2023
LAPP: Layer Adaptive Progressive Pruning for Compressing CNNs from Scratch
P. Zhai
K. Guo
Fan Liu
Xiaofen Xing
Xiangmin Xu
323
3
0
25 Sep 2023
QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection
IEEE International Conference on Computer Vision (ICCV), 2023
Yifan Zhang
Zhen Dong
Huanrui Yang
Ming Lu
Cheng-Ching Tseng
Yuan Du
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
227
15
0
21 Aug 2023
ResQ: Residual Quantization for Video Perception
IEEE International Conference on Computer Vision (ICCV), 2023
Davide Abati
H. Yahia
Markus Nagel
A. Habibian
MQ
302
3
0
18 Aug 2023
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
International Conference on Field-Programmable Technology (ICFPT), 2023
Benjamin Ramhorst
Vladimir Loncar
George A. Constantinides
255
13
0
09 Aug 2023
Quantization Aware Factorization for Deep Neural Network Compression
Journal of Artificial Intelligence Research (JAIR), 2023
Daria Cherniuk
Stanislav Abukhovich
Anh-Huy Phan
Ivan Oseledets
A. Cichocki
Julia Gusak
MQ
314
7
0
08 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
343
6
0
07 Aug 2023
QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Jorn W. T. Peters
Marios Fournarakis
Markus Nagel
M. V. Baalen
Tijmen Blankevoort
MQ
203
7
0
10 Jul 2023
Pruning vs Quantization: Which is Better?
Neural Information Processing Systems (NeurIPS), 2023
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
382
120
0
06 Jul 2023
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
International Conference on Artificial Intelligence Circuits and Systems (ICAICS), 2023
Georg Rutishauser
Francesco Conti
Luca Benini
MQ
286
5
0
06 Jul 2023
Neural Network Compression using Binarization and Few Full-Precision Weights
Information Sciences (Inf. Sci.), 2023
F. M. Nardini
Cosimo Rulli
Salvatore Trani
Rossano Venturini
MQ
365
1
0
15 Jun 2023
A Tensor-based Convolutional Neural Network for Small Dataset Classification
Zhenhua Chen
David J. Crandall
228
0
0
29 Mar 2023
OTOV2: Automatic, Generic, User-Friendly
International Conference on Learning Representations (ICLR), 2023
Tianyi Chen
Luming Liang
Tian Ding
Zhihui Zhu
Ilya Zharkov
VLM
MQ
304
51
0
13 Mar 2023
Structured Pruning for Deep Convolutional Neural Networks: A survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yang He
Lingao Xiao
3DPC
473
318
0
01 Mar 2023
Ultra-low Precision Multiplication-free Training for Deep Neural Networks
Yu Xie
Rui Zhang
Xishan Zhang
Yifan Hao
Zidong Du
Xingui Hu
Ling Li
Qi Guo
MQ
363
2
0
28 Feb 2023
Structured Bayesian Compression for Deep Neural Networks Based on The Turbo-VBI Approach
IEEE Transactions on Signal Processing (IEEE TSP), 2023
Cheng-Gang Xia
Danny H. K. Tsang
Vincent K. N. Lau
BDL
213
8
0
21 Feb 2023
Towards Optimal Compression: Joint Pruning and Quantization
Ben Zandonati
Glenn Bucagu
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
410
7
0
15 Feb 2023
A Practical Mixed Precision Algorithm for Post-Training Quantization
N. Pandey
Markus Nagel
M. V. Baalen
Yin-Ruey Huang
Chirag I. Patel
Tijmen Blankevoort
MQ
288
30
0
10 Feb 2023
Efficient Quantized Sparse Matrix Operations on Tensor Cores
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
Shigang Li
Kazuki Osawa
Torsten Hoefler
505
51
0
14 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViT
MQ
314
64
0
13 Sep 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
391
20
0
11 Aug 2022
Quantized Sparse Weight Decomposition for Neural Network Compression
Andrey Kuzmin
M. V. Baalen
Markus Nagel
Arash Behboodi
MQ
163
3
0
22 Jul 2022
1
2
Next
Page 1 of 2