Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.05686
Cited By
And the Bit Goes Down: Revisiting the Quantization of Neural Networks
12 July 2019
Pierre Stock
Armand Joulin
Rémi Gribonval
Benjamin Graham
Hervé Jégou
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"And the Bit Goes Down: Revisiting the Quantization of Neural Networks"
26 / 26 papers shown
Title
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta
Sina Mahdipour Saravani
P. Sadayappan
Vivek Srikumar
24
2
0
17 Jun 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization
M. V. Baalen
Andrey Kuzmin
Markus Nagel
Peter Couperus
Cédric Bastoul
E. Mahurin
Tijmen Blankevoort
Paul N. Whatmough
MQ
34
28
0
23 Feb 2024
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
32
1
0
12 Jul 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
18
4
0
25 May 2023
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Shishira R. Maiya
Sharath Girish
Max Ehrlich
Hanyu Wang
Kwot Sin Lee
Patrick Poirson
Pengxiang Wu
Chen Wang
Abhinav Shrivastava
VGen
36
40
0
30 Dec 2022
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen-li Ma
Xue Liu
MQ
22
3
0
24 Dec 2022
Deep learning model compression using network sensitivity and gradients
M. Sakthi
N. Yadla
Raj Pawate
11
2
0
11 Oct 2022
Look-ups are not (yet) all you need for deep learning inference
Calvin McCarter
Nicholas Dronen
19
4
0
12 Jul 2022
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
Sharath Girish
Kamal Gupta
Saurabh Singh
Abhinav Shrivastava
26
11
0
06 Apr 2022
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
S. Siddegowda
Marios Fournarakis
Markus Nagel
Tijmen Blankevoort
Chirag I. Patel
Abhijit Khobare
MQ
12
31
0
20 Jan 2022
Implicit Neural Video Compression
Yunfan Zhang
T. V. Rozendaal
Johann Brehmer
Markus Nagel
Taco S. Cohen
41
57
0
21 Dec 2021
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
Rui Han
Qinglong Zhang
C. Liu
Guoren Wang
Jian Tang
L. Chen
8
43
0
18 Dec 2021
Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression
Yuezhou Sun
Wenlong Zhao
Lijun Zhang
Xiao Liu
Hui Guan
Matei A. Zaharia
21
0
0
19 Nov 2021
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
Berivan Isik
P. Chou
S. Hwang
Nick Johnston
G. Toderici
3DPC
21
28
0
17 Nov 2021
A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks
Erion-Vasilis M. Pikoulis
C. Mavrokefalidis
Aris S. Lalos
14
10
0
19 Jul 2021
A White Paper on Neural Network Quantization
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
19
503
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
27
811
0
14 Jun 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
Knowledge Distillation as Semiparametric Inference
Tri Dao
G. Kamath
Vasilis Syrgkanis
Lester W. Mackey
14
31
0
20 Apr 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
25
67
0
08 Feb 2021
Transform Quantization for CNN (Convolutional Neural Network) Compression
Sean I. Young
Wang Zhe
David S. Taubman
B. Girod
MQ
19
69
0
02 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
22
265
0
14 Aug 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
40
98
0
05 Jun 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
25
30
0
20 May 2020
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
311
1,047
0
10 Feb 2017
1