Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,628 papers shown
Title
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
171
25
0
16 Jan 2024
Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning
IEEE Access (IEEE Access), 2024
Manish Sharma
Jamison Heard
Eli Saber
Panos P. Markopoulos
216
3
0
15 Jan 2024
Activations and Gradients Compression for Model-Parallel Training
Doklady. Mathematics (Dokl. Math.), 2023
Mikhail Rudakov
Aleksandr Beznosikov
Yaroslav Kholodov
Alexander Gasnikov
323
5
0
15 Jan 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Annual Review of Statistics and Its Application (ARSIA), 2024
Namjoon Suh
Guang Cheng
MedIm
305
17
0
14 Jan 2024
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2024
Ji Liu
Dehua Tang
Yuanxian Huang
Li Zhang
Xiaocheng Zeng
...
Jinzhang Peng
Yu Wang
Fan Jiang
Lu Tian
Ashish Sirasao
ViT
215
14
0
12 Jan 2024
Body-Area Capacitive or Electric Field Sensing for Human Activity Recognition and Human-Computer Interaction: A Comprehensive Survey
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Sizhen Bian
Mengxi Liu
Bo Zhou
P. Lukowicz
Michele Magno
186
20
0
11 Jan 2024
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
European Conference on Computer Vision (ECCV), 2024
Hyogon Ryu
Seohyun Lim
Hyunjung Shim
DiffM
MQ
190
7
0
09 Jan 2024
A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models
Rui-ya Ma
Qiang Zhou
Yizhu Jin
Daquan Zhou
Bangjun Xiao
...
Jingtong Hu
Xiaodong Xie
Zhen Dong
Shanghang Zhang
Shiji Zhou
263
6
0
04 Jan 2024
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Chen Tang
Yuan Meng
Jiacheng Jiang
Shuzhao Xie
Rongwei Lu
Cheng Wang
Zhi Wang
Wenwu Zhu
MQ
192
17
0
03 Jan 2024
Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment
IEEE Transactions on Software Engineering (TSE), 2024
Jie Zhu
Leye Wang
Xiao Han
Anmin Liu
Tao Xie
AAML
179
6
0
02 Jan 2024
One-Shot Multi-Rate Pruning of Graph Convolutional Networks
H. Sahbi
149
0
0
29 Dec 2023
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
199
2
0
27 Dec 2023
Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks
Juyoung Yun
196
1
0
26 Dec 2023
Fairness-Aware Structured Pruning in Transformers
A. Zayed
Gonçalo Mordido
Samira Shabanian
Ioana Baldini
Sarath Chandar
211
30
0
24 Dec 2023
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
K. Balaskas
Andreas Karatzas
Christos Sad
K. Siozios
Iraklis Anagnostopoulos
Georgios Zervakis
Jörg Henkel
MQ
168
22
0
23 Dec 2023
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching
Pengmiao Zhang
Neelesh Gupta
Rajgopal Kannan
Viktor K. Prasanna
182
4
0
23 Dec 2023
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention
Zhen Tan
Tianlong Chen
Zhenyu Zhang
Huan Liu
164
25
0
22 Dec 2023
Resource-Limited Automated Ki67 Index Estimation in Breast Cancer
J. Gliozzo
Giosuè Cataldo Marinò
A. Bonometti
Marco Frasca
Dario Malchiodi
138
0
0
22 Dec 2023
Sparse Training for Federated Learning with Regularized Error Correction
Ran Greidi
Kobi Cohen
FedML
338
3
0
21 Dec 2023
How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry'' Benchmark
Eldar Kurtic
Torsten Hoefler
Dan Alistarh
201
3
0
21 Dec 2023
Model-Based Control with Sparse Neural Dynamics
Ziang Liu
Genggeng Zhou
Jeff He
Tobia Marcucci
Fei-Fei Li
Jiajun Wu
Yunzhu Li
AI4CE
248
21
0
20 Dec 2023
Towards Efficient Verification of Quantized Neural Networks
Pei Huang
Haoze Wu
Yuting Yang
Ieva Daukantas
Min Wu
Yedi Zhang
Clark W. Barrett
MQ
187
19
0
20 Dec 2023
Fluctuation-based Adaptive Structured Pruning for Large Language Models
Yongqi An
Xu Zhao
Tao Yu
Ming Tang
Jinqiao Wang
224
92
0
19 Dec 2023
Optimizing Dense Feed-Forward Neural Networks
Neural Networks (Neural Netw.), 2023
Luis Balderas
Miguel Lastra
José M. Benítez
179
10
0
16 Dec 2023
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Symposium on Operating Systems Principles (SOSP), 2023
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
336
209
0
16 Dec 2023
Gradient-based Parameter Selection for Efficient Fine-Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Zhi Zhang
Qizhe Zhang
Zijun Gao
Renrui Zhang
Ekaterina Shutova
Shiji Zhou
Shanghang Zhang
325
41
0
15 Dec 2023
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators
Tianyi Chen
Tianyu Ding
Zhihui Zhu
Zeyu Chen
HsiangTao Wu
Ilya Zharkov
Luming Liang
176
5
0
15 Dec 2023
Balanced and Deterministic Weight-sharing Helps Network Performance
Oscar Chang
Hod Lipson
105
0
0
13 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
International Conference on Learning Representations (ICLR), 2023
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
729
28
0
13 Dec 2023
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means
Sean Jaffe
Ambuj K. Singh
Francesco Bullo
MQ
199
0
0
12 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
233
190
0
12 Dec 2023
MaxQ: Multi-Axis Query for N:M Sparsity Network
Computer Vision and Pattern Recognition (CVPR), 2023
Jingyang Xiang
Siqi Li
Junhao Chen
Zhuangzhi Chen
Tianxin Huang
Linpeng Peng
Yong-Jin Liu
221
0
0
12 Dec 2023
Measurement-driven neural-network training for integrated magnetic tunnel junction arrays
W. A. Borders
A. Madhavan
M. Daniels
Vasileia Georgiou
Martin Lueker-Boden
Tiffany S. Santos
Patrick M. Braganca
M. D. Stiles
Jabez J. McClelland
Brian D. Hoskins
208
6
0
11 Dec 2023
Sense, Predict, Adapt, Repeat: A Blueprint for Design of New Adaptive AI-Centric Sensing Systems
S. Hor
Amin Arbabian
174
2
0
11 Dec 2023
FP8-BERT: Post-Training Quantization for Transformer
Jianwei Li
Tianchi Zhang
Ian En-Hsu Yen
Dongkuan Xu
MQ
229
9
0
10 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
International Symposium on Mathematical Morphology and Its Application to Signal and Image Processing (ISMM), 2023
Susav Shrestha
Narasimha Reddy
Zongwang Li
219
12
0
09 Dec 2023
A Masked Pruning Approach for Dimensionality Reduction in Communication-Efficient Federated Learning Systems
Tamir L. S. Gez
Kobi Cohen
174
3
0
06 Dec 2023
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit
Fanfei Meng
Lele Zhang
Yu Chen
Yuxin Wang
211
10
0
05 Dec 2023
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
AAAI Conference on Artificial Intelligence (AAAI), 2023
Can Jin
Tianjin Huang
Yihua Zhang
Mykola Pechenizkiy
Sijia Liu
Shiwei Liu
Tianlong Chen
VLM
416
30
0
03 Dec 2023
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Srinath Namburi
Makesh Narsimhan Sreedhar
Srinath Srinivasan
Frederic Sala
MQ
201
11
0
01 Dec 2023
A Compact Implicit Neural Representation for Efficient Storage of Massive 4D Functional Magnetic Resonance Imaging
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ruoran Li
Runzhao Yang
Wenxin Xiang
Yuxiao Cheng
Tingxiong Xiao
J. Suo
AI4CE
340
1
0
30 Nov 2023
Towards Higher Ranks via Adversarial Weight Pruning
Neural Information Processing Systems (NeurIPS), 2023
Yuchuan Tian
Hanting Chen
Tianyu Guo
Chao Xu
Yunhe Wang
203
4
0
29 Nov 2023
Relationship between Model Compression and Adversarial Robustness: A Review of Current Evidence
IEEE Symposium Series on Computational Intelligence (IEEE-SSCI), 2023
Svetlana Pavlitska
Hannes Grolig
J. Marius Zöllner
AAML
210
5
0
27 Nov 2023
BinaryHPE: 3D Human Pose and Shape Estimation via Binarization
Zhiteng Li
Yulun Zhang
Jing Lin
Haotong Qin
Jinjin Gu
Xin Yuan
Linghe Kong
Yunbo Wang
3DH
355
1
0
24 Nov 2023
When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence
Benoît Coqueret
Mathieu Carbone
Olivier Sentieys
Gabriel Zaid
191
2
0
23 Nov 2023
REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints
IEEE Transactions on Mobile Computing (IEEE TMC), 2023
Francesco Corti
Balz Maag
Joachim Schauer
U. Pferschy
O. Saukh
309
3
0
22 Nov 2023
Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review
M. Lê
Pierre Wolinski
Julyan Arbel
239
17
0
20 Nov 2023
Tensor-Aware Energy Accounting
Timur Babakol
Yu David Liu
126
5
0
19 Nov 2023
LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms
Young D. Kwon
Jagmohan Chauhan
Hong Jia
Stylianos I. Venieris
Cecilia Mascolo
198
20
0
19 Nov 2023
Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing
Cédric Gernigon
Silviu-Ioan Filip
Olivier Sentieys
Clément Coggiola
Mickael Bruno
MQ
127
10
0
18 Nov 2023
Previous
1
2
3
...
12
13
14
...
71
72
73
Next