Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.05668
Cited By
Model compression via distillation and quantization
15 February 2018
A. Polino
Razvan Pascanu
Dan Alistarh
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Model compression via distillation and quantization"
50 / 133 papers shown
Title
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
54
0
0
05 May 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
165
0
0
20 Mar 2025
Towards Understanding Distilled Reasoning Models: A Representational Approach
David D. Baek
Max Tegmark
LRM
80
3
0
05 Mar 2025
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
Online-Score-Aided Federated Learning: Taming the Resource Constraints in Wireless Networks
Md Ferdous Pervej
Minseok Choi
A. Molisch
33
0
0
12 Aug 2024
Training Foundation Models as Data Compression: On Information, Model Weights and Copyright Law
Giorgio Franceschelli
Claudia Cevenini
Mirco Musolesi
44
0
0
18 Jul 2024
Relational Representation Distillation
Nikolaos Giakoumoglou
Tania Stathaki
34
0
0
16 Jul 2024
HDKD: Hybrid Data-Efficient Knowledge Distillation Network for Medical Image Classification
Omar S. El-Assiouti
Ghada Hamed
Dina Khattab
H. M. Ebied
37
1
0
10 Jul 2024
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing
Hongxiang Zhang
Yuyang Rong
Yifeng He
Hao Chen
23
7
0
11 Jun 2024
Fast Vocabulary Transfer for Language Model Compression
Leonidas Gee
Andrea Zugarini
Leonardo Rigutini
Paolo Torroni
35
26
0
15 Feb 2024
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget
Kun Wang
Jiani Cao
Zimu Zhou
Zhenjiang Li
22
5
0
30 Jan 2024
Pursing the Sparse Limitation of Spiking Deep Learning Structures
Hao-Ran Cheng
Jiahang Cao
Erjia Xiao
Mengshu Sun
Le Yang
Jize Zhang
Xue Lin
B. Kailkhura
Kaidi Xu
Renjing Xu
16
1
0
18 Nov 2023
The Road to On-board Change Detection: A Lightweight Patch-Level Change Detection Network via Exploring the Potential of Pruning and Pooling
Lihui Xue
Zhihao Wang
Xueqian Wang
Gang Li
35
1
0
16 Oct 2023
Soft Quantization using Entropic Regularization
Rajmadan Lakshmanan
Alois Pichler
MQ
13
5
0
08 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
21
6
0
02 Sep 2023
Quantized Feature Distillation for Network Quantization
Kevin Zhu
Yin He
Jianxin Wu
MQ
29
9
0
20 Jul 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
34
1
0
12 Jul 2023
InfLoR-SNN: Reducing Information Loss for Spiking Neural Networks
Yu-Zhu Guo
Y. Chen
Liwen Zhang
Xiaode Liu
Xinyi Tong
Yuanyuan Ou
Xuhui Huang
Zhe Ma
AAML
39
3
0
10 Jul 2023
A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When?
M. Rubaiyat
Hossain Mondal
Prajoy Podder
20
56
0
10 Apr 2023
Performance-aware Approximation of Global Channel Pruning for Multitask CNNs
Hancheng Ye
Bo-Wen Zhang
Tao Chen
Jiayuan Fan
Bin Wang
29
18
0
21 Mar 2023
LightTS: Lightweight Time Series Classification with Adaptive Ensemble Distillation -- Extended Version
David Campos
Miao Zhang
B. Yang
Tung Kieu
Chenjuan Guo
Christian S. Jensen
AI4TS
45
47
0
24 Feb 2023
RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs
A. M. Ribeiro-dos-Santos
João Dinis Ferreira
O. Mutlu
G. Falcão
MQ
15
1
0
15 Jan 2023
Pruning Compact ConvNets for Efficient Inference
Sayan Ghosh
Karthik Prasad
Xiaoliang Dai
Peizhao Zhang
Bichen Wu
Graham Cormode
Peter Vajda
VLM
19
4
0
11 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
26
7
0
06 Jan 2023
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
Jiawei Liu
Lin Niu
Zhihang Yuan
Dawei Yang
Xinggang Wang
Wenyu Liu
MQ
96
68
0
14 Dec 2022
CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification
Lirui Xiao
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
27
10
0
06 Dec 2022
QFT: Post-training quantization via fast joint finetuning of all degrees of freedom
Alexander Finkelstein
Ella Fuchs
Idan Tal
Mark Grobman
Niv Vosco
Eldad Meller
MQ
24
6
0
05 Dec 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Yijiang Liu
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
31
45
0
29 Nov 2022
AskewSGD : An Annealed interval-constrained Optimisation method to train Quantized Neural Networks
Louis Leconte
S. Schechtman
Eric Moulines
29
4
0
07 Nov 2022
Fast and Low-Memory Deep Neural Networks Using Binary Matrix Factorization
Alireza Bordbar
M. Kahaei
MQ
25
0
0
24 Oct 2022
Towards Global Neural Network Abstractions with Locally-Exact Reconstruction
Edoardo Manino
I. Bessa
Lucas C. Cordeiro
21
1
0
21 Oct 2022
Mixed-Precision Neural Networks: A Survey
M. Rakka
M. Fouda
Pramod P. Khargonekar
Fadi J. Kurdahi
MQ
21
11
0
11 Aug 2022
Distributed Training for Deep Learning Models On An Edge Computing Network Using ShieldedReinforcement Learning
Tanmoy Sen
Haiying Shen
OffRL
11
5
0
01 Jun 2022
Dataset Distillation using Neural Feature Regression
Yongchao Zhou
E. Nezhadarya
Jimmy Ba
DD
FedML
39
149
0
01 Jun 2022
Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer
S. H. Shabbeer Basha
Debapriya Tula
Sravan Kumar Vinakota
S. Dubey
26
2
0
12 May 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
21
22
0
10 May 2022
Compact Model Training by Low-Rank Projection with Energy Transfer
K. Guo
Zhenquan Lin
Xiaofen Xing
Fang Liu
Xiangmin Xu
22
2
0
12 Apr 2022
EfficientFi: Towards Large-Scale Lightweight WiFi Sensing via CSI Compression
Jianfei Yang
Xinyan Chen
Han Zou
Dazhuo Wang
Q. Xu
Lihua Xie
14
78
0
08 Apr 2022
Bimodal Distributed Binarized Neural Networks
T. Rozen
Moshe Kimhi
Brian Chmiel
A. Mendelson
Chaim Baskin
MQ
36
4
0
05 Apr 2022
FedSynth: Gradient Compression via Synthetic Data in Federated Learning
Shengyuan Hu
Jack Goetz
Kshitiz Malik
Hongyuan Zhan
Zhe Liu
Yue Liu
DD
FedML
34
38
0
04 Apr 2022
Update Compression for Deep Neural Networks on the Edge
Bo Chen
A. Bakhshi
Gustavo E. A. P. A. Batista
Brian Ng
Tat-Jun Chin
24
17
0
09 Mar 2022
Deadwooding: Robust Global Pruning for Deep Neural Networks
Sawinder Kaur
Ferdinando Fioretto
Asif Salekin
19
4
0
10 Feb 2022
Robust Binary Models by Pruning Randomly-initialized Networks
Chen Liu
Ziqi Zhao
Sabine Süsstrunk
Mathieu Salzmann
TPM
AAML
MQ
19
4
0
03 Feb 2022
Iterative Activation-based Structured Pruning
Kaiqi Zhao
Animesh Jain
Ming Zhao
39
0
0
22 Jan 2022
Enabling Deep Learning on Edge Devices through Filter Pruning and Knowledge Transfer
Kaiqi Zhao
Yitao Chen
Ming Zhao
25
3
0
22 Jan 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
21
6
0
15 Jan 2022
Problem-dependent attention and effort in neural networks with applications to image resolution and model selection
Chris Rohlfs
16
4
0
05 Jan 2022
Learning Robust and Lightweight Model through Separable Structured Transformations
Xian Wei
Yanhui Huang
Yang Xu
Mingsong Chen
Hai Lan
Yuanxiang Li
Zhongfeng Wang
Xuan Tang
OOD
24
0
0
27 Dec 2021
Illumination and Temperature-Aware Multispectral Networks for Edge-Computing-Enabled Pedestrian Detection
Yifan Zhuang
Ziyuan Pu
Jia Hu
Yinhai Wang
25
24
0
09 Dec 2021
Automatic Neural Network Pruning that Efficiently Preserves the Model Accuracy
Thibault Castells
Seul-Ki Yeom
3DV
18
3
0
18 Nov 2021
1
2
3
Next