Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.07320
Cited By
Training with Quantization Noise for Extreme Model Compression
15 April 2020
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training with Quantization Noise for Extreme Model Compression"
50 / 54 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
49
0
0
13 May 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
74
0
0
28 Jan 2025
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
36
0
0
01 Nov 2024
Data Generation for Hardware-Friendly Post-Training Quantization
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
133
0
0
29 Oct 2024
Token Pruning using a Lightweight Background Aware Vision Transformer
Sudhakar Sah
Ravish Kumar
Honnesh Rohmetra
Ehsan Saboori
ViT
21
0
0
12 Oct 2024
NVRC: Neural Video Representation Compression
Ho Man Kwan
Ge Gao
Fan Zhang
Andrew Gower
David Bull
23
11
0
11 Sep 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
32
3
0
25 Mar 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization
M. V. Baalen
Andrey Kuzmin
Markus Nagel
Peter Couperus
Cédric Bastoul
E. Mahurin
Tijmen Blankevoort
Paul N. Whatmough
MQ
34
28
0
23 Feb 2024
Immersive Video Compression using Implicit Neural Representations
Ho Man Kwan
Fan Zhang
Andrew Gower
David Bull
21
3
0
02 Feb 2024
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
Learning Kernel-Modulated Neural Representation for Efficient Light Field Compression
Jinglei Shi
Yihong Xu
C. Guillemot
19
5
0
12 Jul 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
32
1
0
12 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
13
88
0
22 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
46
187
0
29 May 2023
Patch-wise Mixed-Precision Quantization of Vision Transformer
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
24
12
0
11 May 2023
Learning-based Spatial and Angular Information Separation for Light Field Compression
Jinglei Shi
Yihong Xu
C. Guillemot
21
0
0
13 Apr 2023
Are Visual Recognition Models Robust to Image Compression?
Joao Maria Janeiro
Stanislav Frolov
Alaaeldin El-Nouby
Jakob Verbeek
VLM
18
4
0
10 Apr 2023
MiniRBT: A Two-stage Distilled Small Chinese Pre-trained Model
Xin Yao
Ziqing Yang
Yiming Cui
Shijin Wang
18
3
0
03 Apr 2023
Rotation Invariant Quantization for Model Compression
Dor-Joseph Kampeas
Yury Nahshan
Hanoch Kremer
Gil Lederman
Shira Zaloshinski
Zheng Li
E. Haleva
MQ
16
0
0
03 Mar 2023
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Shishira R. Maiya
Sharath Girish
Max Ehrlich
Hanyu Wang
Kwot Sin Lee
Patrick Poirson
Pengxiang Wu
Chen Wang
Abhinav Shrivastava
VGen
36
40
0
30 Dec 2022
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Dan Liu
X. Chen
Chen-li Ma
Xue Liu
MQ
22
3
0
24 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
18
6
0
05 Dec 2022
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
25
3
0
14 Nov 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
29
8
0
24 Oct 2022
Deep learning model compression using network sensitivity and gradients
M. Sakthi
N. Yadla
Raj Pawate
16
2
0
11 Oct 2022
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
22
145
0
27 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Ayon Basumallik
D. Bunandar
Nicholas Dronen
Nicholas Harris
Ludmila Levkova
Calvin McCarter
Lakshmi Nair
David Walter
David Widemann
9
6
0
12 May 2022
SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems
Xin Dong
B. D. Salvo
Meng Li
Chiao Liu
Zhongnan Qu
H. T. Kung
Ziyun Li
3DGS
21
20
0
10 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar S. Karnin
17
14
0
27 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
Xiuying Wei
Ruihao Gong
Yuhang Li
Xianglong Liu
F. Yu
MQ
VLM
19
166
0
11 Mar 2022
UDC: Unified DNAS for Compressible TinyML Models
Igor Fedorov
Ramon Matas
Hokchhay Tann
Chu Zhou
Matthew Mattina
P. Whatmough
AI4CE
21
13
0
15 Jan 2022
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
Sung Une Lee
Boming Xia
Yongan Zhang
Ang Li
Yingyan Lin
GNN
47
47
0
22 Dec 2021
Neural Network Quantization for Efficient Inference: A Survey
Olivia Weng
MQ
17
22
0
08 Dec 2021
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator
Chuteng Zhou
F. García-Redondo
Julian Büchel
I. Boybat
Xavier Timoneda Comas
S. Nandakumar
Shidhartha Das
A. Sebastian
M. Le Gallo
P. Whatmough
25
16
0
10 Nov 2021
Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision
Bo-wen Li
Xinyang Jiang
Donglin Bai
Yuge Zhang
Ningxin Zheng
Xuanyi Dong
Lu Liu
Yuqing Yang
Dongsheng Li
14
10
0
30 Aug 2021
Post-Training Quantization for Vision Transformer
Zhenhua Liu
Yunhe Wang
Kai Han
Siwei Ma
Wen Gao
ViT
MQ
39
324
0
27 Jun 2021
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Gaurav Menghani
VLM
MedIm
23
365
0
16 Jun 2021
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
16
655
0
07 May 2021
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
Random and Adversarial Bit Error Robustness: Energy-Efficient and Secure DNN Accelerators
David Stutz
Nandhini Chandramoorthy
Matthias Hein
Bernt Schiele
AAML
MQ
20
18
0
16 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
23
986
0
31 Mar 2021
Quantization-Guided Training for Compact TinyML Models
Sedigh Ghamari
Koray Ozcan
Thu Dinh
A. Melnikov
Juan Carvajal
Jan Ernst
S. Chai
MQ
16
16
0
10 Mar 2021
An Information-Theoretic Justification for Model Pruning
Berivan Isik
Tsachy Weissman
Albert No
84
35
0
16 Feb 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
140
221
0
31 Dec 2020
Mixed-Precision Embedding Using a Cache
J. Yang
Jianyu Huang
Jongsoo Park
P. T. P. Tang
Andrew Tulloch
6
36
0
21 Oct 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
19
208
0
27 Sep 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
22
953
0
15 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
74
1,101
0
14 Sep 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
58
1,645
0
08 Jun 2020
1
2
Next