ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.07093
  4. Cited By
Bayesian Bits: Unifying Quantization and Pruning
v1v2v3 (latest)

Bayesian Bits: Unifying Quantization and Pruning

Neural Information Processing Systems (NeurIPS), 2025
14 May 2020
M. V. Baalen
Christos Louizos
Markus Nagel
Rana Ali Amjad
Ying Wang
Tijmen Blankevoort
Max Welling
    MQ
ArXiv (abs)PDFHTML

Papers citing "Bayesian Bits: Unifying Quantization and Pruning"

49 / 49 papers shown
Title
MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
MiCo: End-to-End Mixed Precision Neural Network Co-Exploration Framework for Edge AI
Zijun Jiang
Yangdi Lyu
MQ
20
0
0
13 Aug 2025
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
Text Embedding Knows How to Quantize Text-Guided Diffusion Models
H. Lee
Myungjun Son
Dongjea Kang
Seung-Won Jung
DiffMMQ
82
0
0
14 Jul 2025
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
Chen Feng
Yicheng Lin
Shaojie Zhuo
Chenzheng Su
R. Ramakrishnan
Zhaocong Yuan
Xiaopeng Zhang
MQ
91
0
0
10 Jul 2025
Improving Quantization with Post-Training Model Expansion
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
213
3
0
21 Mar 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
238
1
0
01 Nov 2024
Constraint Guided Model Quantization of Neural Networks
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
144
0
0
30 Sep 2024
Accumulator-Aware Post-Training Quantization for Large Language Models
Accumulator-Aware Post-Training Quantization for Large Language Models
Ian Colbert
Giuseppe Franco
Fabian Grob
Jinjie Zhang
Rayan Saab
MQ
135
4
0
25 Sep 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
Tianyi Chen
Xiaoyi Qu
David Aponte
Colby R. Banbury
Jongwoo Ko
Tianyu Ding
Yong Ma
Vladimir Lyapunov
Ilya Zharkov
Luming Liang
277
2
0
11 Sep 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
137
4
0
01 Jul 2024
Bayesian Federated Model Compression for Communication and Computation
  Efficiency
Bayesian Federated Model Compression for Communication and Computation Efficiency
Cheng-Gang Xia
Danny H. K. Tsang
Vincent K. N. Lau
120
1
0
11 Apr 2024
Adaptive quantization with mixed-precision based on low-cost proxy
Adaptive quantization with mixed-precision based on low-cost proxy
Jing Chen
Qiao Yang
Senmao Tian
Shunli Zhang
MQ
95
2
0
27 Feb 2024
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
  Quantization
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
108
18
0
23 Dec 2023
ResQ: Residual Quantization for Video Perception
ResQ: Residual Quantization for Video Perception
Davide Abati
H. Yahia
Markus Nagel
A. Habibian
MQ
107
2
0
18 Aug 2023
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Benjamin Ramhorst
Vladimir Loncar
George A. Constantinides
100
6
0
09 Aug 2023
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization
  Search
FLIQS: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
Jordan Dotzel
Gang Wu
Andrew Li
M. Umar
Yun Ni
...
Liqun Cheng
Martin G. Dixon
N. Jouppi
Quoc V. Le
Sheng Li
MQ
142
3
0
07 Aug 2023
Pruning vs Quantization: Which is Better?
Pruning vs Quantization: Which is Better?
Andrey Kuzmin
Markus Nagel
M. V. Baalen
Arash Behboodi
Tijmen Blankevoort
MQ
185
77
0
06 Jul 2023
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural
  Networks on the Edge
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
Georg Rutishauser
Francesco Conti
Luca Benini
MQ
122
5
0
06 Jul 2023
A Tensor-based Convolutional Neural Network for Small Dataset
  Classification
A Tensor-based Convolutional Neural Network for Small Dataset Classification
Zhenhua Chen
David J. Crandall
89
0
0
29 Mar 2023
OTOV2: Automatic, Generic, User-Friendly
OTOV2: Automatic, Generic, User-Friendly
Tianyi Chen
Luming Liang
Tian Ding
Zhihui Zhu
Ilya Zharkov
VLMMQ
149
42
0
13 Mar 2023
Structured Pruning for Deep Convolutional Neural Networks: A survey
Structured Pruning for Deep Convolutional Neural Networks: A survey
Yang He
Lingao Xiao
3DPC
193
193
0
01 Mar 2023
Towards Optimal Compression: Joint Pruning and Quantization
Towards Optimal Compression: Joint Pruning and Quantization
Ben Zandonati
Glenn Bucagu
Adrian Alan Pol
M. Pierini
Olya Sirkin
Tal Kopetz
MQ
170
4
0
15 Feb 2023
A Practical Mixed Precision Algorithm for Post-Training Quantization
A Practical Mixed Precision Algorithm for Post-Training Quantization
N. Pandey
Markus Nagel
M. V. Baalen
Yin-Ruey Huang
Chirag I. Patel
Tijmen Blankevoort
MQ
110
25
0
10 Feb 2023
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
248
36
0
14 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for
  Vision Transformers
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViTMQ
186
41
0
13 Sep 2022
Mixed-Precision Neural Networks: A Survey
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
160
14
0
11 Aug 2022
Quantization Robust Federated Learning for Efficient Inference on
  Heterogeneous Devices
Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices
Kartik Gupta
Marios Fournarakis
M. Reisser
Christos Louizos
Markus Nagel
FedML
101
19
0
22 Jun 2022
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Fast Lossless Neural Compression with Integer-Only Discrete Flows
Siyu Wang
Jianfei Chen
Chongxuan Li
Jun Zhu
Bo Zhang
MQ
107
7
0
17 Jun 2022
Training Quantised Neural Networks with STE Variants: the Additive Noise
  Annealing Algorithm
Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm
Matteo Spallanzani
G. P. Leonardi
Luca Benini
88
3
0
21 Mar 2022
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution
  Networks
Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks
Mingliang Xu
Mingbao Lin
Xunchao Li
Ke Li
Chunjiang Ge
Yong Li
Yongjian Wu
Rongrong Ji
MQ
120
28
0
08 Mar 2022
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
S. Siddegowda
Marios Fournarakis
Markus Nagel
Tijmen Blankevoort
Chirag I. Patel
Abhijit Khobare
MQ
111
39
0
20 Jan 2022
Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural
  Networks
Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks
Runpei Dong
Zhanhong Tan
Mengdi Wu
Linfeng Zhang
Kaisheng Ma
MQ
209
13
0
30 Dec 2021
Implicit Neural Video Compression
Implicit Neural Video Compression
Yunfan Zhang
T. V. Rozendaal
Johann Brehmer
Markus Nagel
Taco S. Cohen
155
63
0
21 Dec 2021
MQBench: Towards Reproducible and Deployable Model Quantization
  Benchmark
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark
Yuhang Li
Mingzhu Shen
Jian Ma
Yan Ren
Mingxin Zhao
Tao Gui
Yazhe Niu
F. Yu
Junjie Yan
MQ
114
61
0
05 Nov 2021
Reconstructing Pruned Filters using Cheap Spatial Transformations
Reconstructing Pruned Filters using Cheap Spatial Transformations
Roy Miles
K. Mikolajczyk
127
0
0
25 Oct 2021
Training Deep Neural Networks with Joint Quantization and Pruning of
  Weights and Activations
Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations
Xinyu Zhang
Ian Colbert
Ken Kreutz-Delgado
Srinjoy Das
MQ
163
12
0
15 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
133
160
0
27 Sep 2021
Generalizable Mixed-Precision Quantization via Attribution Rank
  Preservation
Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
Ziwei Wang
Han Xiao
Jiwen Lu
Jie Zhou
MQ
124
32
0
05 Aug 2021
Only Train Once: A One-Shot Neural Network Training And Pruning
  Framework
Only Train Once: A One-Shot Neural Network Training And Pruning Framework
Tianyi Chen
Bo Ji
Tianyu Ding
Biyi Fang
Guanyi Wang
Zhihui Zhu
Luming Liang
Yixin Shi
Sheng Yi
Xiao Tu
168
114
0
15 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
135
174
0
02 Jul 2021
A White Paper on Neural Network Quantization
A White Paper on Neural Network Quantization
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
220
629
0
15 Jun 2021
BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
Haoping Bai
Mengsi Cao
Ping Huang
Jiulong Shan
MQ
108
35
0
19 May 2021
Integer-only Zero-shot Quantization for Efficient Speech Recognition
Integer-only Zero-shot Quantization for Efficient Speech Recognition
Sehoon Kim
A. Gholami
Z. Yao
Nicholas Lee
Patrick Wang
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
MQ
147
28
0
31 Mar 2021
Data-free mixed-precision quantization using novel sensitivity metric
Data-free mixed-precision quantization using novel sensitivity metric
Donghyun Lee
M. Cho
Seungwon Lee
Joonho Song
Changkyu Choi
MQ
78
2
0
18 Mar 2021
COIN: COmpression with Implicit Neural representations
COIN: COmpression with Implicit Neural representations
Emilien Dupont
Adam Goliñski
Milad Alizadeh
Yee Whye Teh
Arnaud Doucet
179
251
0
03 Mar 2021
Ps and Qs: Quantization-aware pruning for efficient low latency neural
  network inference
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference
B. Hawks
Javier Mauricio Duarte
Nicholas J. Fraser
Alessandro Pappalardo
N. Tran
Yaman Umuroglu
MQ
135
58
0
22 Feb 2021
On the Effects of Quantisation on Model Uncertainty in Bayesian Neural
  Networks
On the Effects of Quantisation on Model Uncertainty in Bayesian Neural Networks
Martin Ferianc
Partha P. Maji
Matthew Mattina
Miguel R. D. Rodrigues
UQCVBDL
114
10
0
22 Feb 2021
Single-path Bit Sharing for Automatic Loss-aware Model Compression
Single-path Bit Sharing for Automatic Loss-aware Model Compression
Jing Liu
Bohan Zhuang
Peng Chen
Chunhua Shen
Jianfei Cai
Mingkui Tan
MQ
70
11
0
13 Jan 2021
HAWQV3: Dyadic Neural Network Quantization
HAWQV3: Dyadic Neural Network QuantizationInternational Conference on Machine Learning (ICML), 2024
Z. Yao
Zhen Dong
Zhangcheng Zheng
A. Gholami
Jiali Yu
...
Leyuan Wang
Qijing Huang
Yida Wang
Michael W. Mahoney
Kurt Keutzer
MQ
189
89
0
20 Nov 2020
Resource-Efficient Neural Networks for Embedded Systems
Resource-Efficient Neural Networks for Embedded Systems
Wolfgang Roth
Günther Schindler
Lukas Pfeifenberger
Robert Peharz
Sebastian Tschiatschek
Holger Fröning
Franz Pernkopf
Zoubin Ghahramani
137
54
0
07 Jan 2020
1