ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05686
  4. Cited By
And the Bit Goes Down: Revisiting the Quantization of Neural Networks

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

12 July 2019
Pierre Stock
Armand Joulin
Rémi Gribonval
Benjamin Graham
Hervé Jégou
    MQ
ArXivPDFHTML

Papers citing "And the Bit Goes Down: Revisiting the Quantization of Neural Networks"

26 / 26 papers shown
Title
An Empirical Investigation of Matrix Factorization Methods for
  Pre-trained Transformers
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers
Ashim Gupta
Sina Mahdipour Saravani
P. Sadayappan
Vivek Srikumar
24
2
0
17 Jun 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization
GPTVQ: The Blessing of Dimensionality for LLM Quantization
M. V. Baalen
Andrey Kuzmin
Markus Nagel
Peter Couperus
Cédric Bastoul
E. Mahurin
Tijmen Blankevoort
Paul N. Whatmough
MQ
34
28
0
23 Feb 2024
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large
  Language Models
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
32
1
0
12 Jul 2023
PQA: Exploring the Potential of Product Quantization in DNN Hardware
  Acceleration
PQA: Exploring the Potential of Product Quantization in DNN Hardware Acceleration
Ahmed F. AbouElhamayed
Angela Cui
Javier Fernandez-Marques
Nicholas D. Lane
Mohamed S. Abdelfattah
MQ
18
4
0
25 May 2023
NIRVANA: Neural Implicit Representations of Videos with Adaptive
  Networks and Autoregressive Patch-wise Modeling
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Shishira R. Maiya
Sharath Girish
Max Ehrlich
Hanyu Wang
Kwot Sin Lee
Patrick Poirson
Pengxiang Wu
Chen Wang
Abhinav Shrivastava
VGen
36
40
0
30 Dec 2022
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen-li Ma
Xue Liu
MQ
22
3
0
24 Dec 2022
Deep learning model compression using network sensitivity and gradients
Deep learning model compression using network sensitivity and gradients
M. Sakthi
N. Yadla
Raj Pawate
11
2
0
11 Oct 2022
Look-ups are not (yet) all you need for deep learning inference
Look-ups are not (yet) all you need for deep learning inference
Calvin McCarter
Nicholas Dronen
19
4
0
12 Jul 2022
LilNetX: Lightweight Networks with EXtreme Model Compression and
  Structured Sparsification
LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
Sharath Girish
Kamal Gupta
Saurabh Singh
Abhinav Shrivastava
26
11
0
06 Apr 2022
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
S. Siddegowda
Marios Fournarakis
Markus Nagel
Tijmen Blankevoort
Chirag I. Patel
Abhijit Khobare
MQ
12
31
0
20 Jan 2022
Implicit Neural Video Compression
Implicit Neural Video Compression
Yunfan Zhang
T. V. Rozendaal
Johann Brehmer
Markus Nagel
Taco S. Cohen
41
57
0
21 Dec 2021
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
Rui Han
Qinglong Zhang
C. Liu
Guoren Wang
Jian Tang
L. Chen
8
43
0
18 Dec 2021
Toward Compact Parameter Representations for Architecture-Agnostic
  Neural Network Compression
Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression
Yuezhou Sun
Wenlong Zhao
Lijun Zhang
Xiao Liu
Hui Guan
Matei A. Zaharia
21
0
0
19 Nov 2021
LVAC: Learned Volumetric Attribute Compression for Point Clouds using
  Coordinate Based Networks
LVAC: Learned Volumetric Attribute Compression for Point Clouds using Coordinate Based Networks
Berivan Isik
P. Chou
S. Hwang
Nick Johnston
G. Toderici
3DPC
21
28
0
17 Nov 2021
A New Clustering-Based Technique for the Acceleration of Deep
  Convolutional Networks
A New Clustering-Based Technique for the Acceleration of Deep Convolutional Networks
Erion-Vasilis M. Pikoulis
C. Mavrokefalidis
Aris S. Lalos
14
10
0
19 Jul 2021
A White Paper on Neural Network Quantization
A White Paper on Neural Network Quantization
Markus Nagel
Marios Fournarakis
Rana Ali Amjad
Yelysei Bondarenko
M. V. Baalen
Tijmen Blankevoort
MQ
19
503
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
27
811
0
14 Jun 2021
Differentiable Model Compression via Pseudo Quantization Noise
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
Knowledge Distillation as Semiparametric Inference
Knowledge Distillation as Semiparametric Inference
Tri Dao
G. Kamath
Vasilis Syrgkanis
Lester W. Mackey
14
31
0
20 Apr 2021
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
  Neural Network Inference
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
MQ
25
67
0
08 Feb 2021
Transform Quantization for CNN (Convolutional Neural Network)
  Compression
Transform Quantization for CNN (Convolutional Neural Network) Compression
Sean I. Young
Wang Zhe
David S. Taubman
B. Girod
MQ
19
69
0
02 Sep 2020
SPINN: Synergistic Progressive Inference of Neural Networks over Device
  and Cloud
SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis
Stylianos I. Venieris
Mario Almeida
Ilias Leontiadis
Nicholas D. Lane
22
265
0
14 Aug 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
40
98
0
05 Jun 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
25
30
0
20 May 2020
Incremental Network Quantization: Towards Lossless CNNs with
  Low-Precision Weights
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
311
1,047
0
10 Feb 2017
1