Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,622 papers shown
Title
Machine learning and high dimensional vector search
IEEE Data Engineering Bulletin (DEB), 2025
Matthijs Douze
268
0
0
24 Feb 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Guohao Li
Jianlong Wu
Yujie Zhong
Ming-Hsuan Yang
170
3
0
24 Feb 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
257
4
0
24 Feb 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Computer Vision and Pattern Recognition (CVPR), 2025
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
277
4
0
23 Feb 2025
Verification of Bit-Flip Attacks against Quantized Neural Networks
Yedi Zhang
Lei Huang
Pengfei Gao
Fu Song
Jun Sun
Jin Song Dong
AAML
205
2
0
22 Feb 2025
FedSpaLLM: Federated Pruning of Large Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Guangji Bai
Yijiang Li
Zilinghan Li
Bo Pan
Kibaek Kim
FedML
279
6
0
20 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Yunquan Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
170
2
0
19 Feb 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Minxuan Lv
Zhenpeng Su
Leiyu Pan
Yizhe Xiong
Zijia Lin
...
Guiguang Ding
Cheng Luo
Di Zhang
Kun Gai
Songlin Hu
MoE
301
1
0
18 Feb 2025
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
Ding-Yong Hong
Tzu-Hsien Tsai
Ning Wang
Pangfeng Liu
Jan-Jan Wu
176
1
0
18 Feb 2025
Compression of Site-Specific Deep Neural Networks for Massive MIMO Precoding
Ghazal Kasalaee
Ali Hasanzadeh Karkan
J. Frigon
François Leduc-Primeau
70
0
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
IEEE Internet of Things Journal (IEEE IoT J.), 2025
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
313
3
0
11 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
562
8
0
10 Feb 2025
Kolmogorov-Arnold Fourier Networks
Jusheng Zhang
Yijia Fan
Kaitong Cai
Keze Wang
245
0
0
09 Feb 2025
BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
Reena Elangovan
Charbel Sakr
A. Raghunathan
Brucek Khailany
MQ
249
3
0
07 Feb 2025
Advancing Weight and Channel Sparsification with Enhanced Saliency
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Xinglong Sun
Maying Shen
Hongxu Yin
Lei Mao
Pavlo Molchanov
Jose M. Alvarez
170
1
0
05 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
Yulun Zhang
MQ
439
4
0
03 Feb 2025
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
457
1
0
02 Feb 2025
Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
Optica (Optica), 2025
Shupeng Ning
Hanqing Zhu
Chenghao Feng
Jiaqi Gu
David Z. Pan
Ray T. Chen
196
2
0
01 Feb 2025
DCentNet: Decentralized Multistage Biomedical Signal Classification using Early Exits
Biomedical Signal Processing and Control (BSPC), 2025
Xiaolin Li
Binhua Huang
B. Cardiff
Deepu John
109
0
0
31 Jan 2025
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Diego Cerretti
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
480
1
0
31 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
International Conference on Learning Representations (ICLR), 2024
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
327
12
0
28 Jan 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
International Symposium on High-Performance Computer Architecture (HPCA), 2025
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
343
8
0
20 Jan 2025
Coded Deep Learning: Framework and Algorithm
IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2025
En-Hui Yang
Shayan Mohajer Hamidi
103
3
0
20 Jan 2025
MOGNET: A Mux-residual quantized Network leveraging Online-Generated weights
International Conference on Artificial Intelligence Circuits and Systems (ICAICS), 2022
Van Thien Nguyen
William Guicquero
Gilles Sicard
MQ
250
1
0
17 Jan 2025
Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
Daniel de Souza Severo
Giuseppe Ottaviano
Matthew Muckley
Karen Ullrich
Matthijs Douze
MQ
225
1
0
16 Jan 2025
Histogram-Equalized Quantization for logic-gated Residual Neural Networks
International Symposium on Circuits and Systems (ISCAS), 2022
Van Thien Nguyen
William Guicquero
Gilles Sicard
MQ
258
3
0
10 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
279
30
0
08 Jan 2025
PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization
IEEE Access (IEEE Access), 2025
Assaf Lahiany
Yehudit Aperstein
209
8
0
07 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
227
1
0
06 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Xiping Hu
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
530
14
0
06 Jan 2025
Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment
International Conference on Artificial Neural Networks (ICANN), 2025
Xubin Wang
Weijia Jia
Weijia Jia
413
21
0
04 Jan 2025
Pruning-based Data Selection and Network Fusion for Efficient Deep Learning
Humaira Kousar
Hasnain Irshad Bhatti
Jaekyun Moon
297
1
0
03 Jan 2025
SlimGPT: Layer-wise Structured Pruning for Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Gui Ling
Ziyang Wang
Yuliang Yan
Qingwen Liu
167
27
0
24 Dec 2024
AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning
Lixian Jing
Haobing Liu
Junyu Dong
Yanwei Yu
3DPC
AI4CE
263
1
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Lean Fu
Xing Mei
MQ
387
3
0
23 Dec 2024
Lightweight Design and Optimization methods for DCNNs: Progress and Futures
Hanhua Long
Wenbin Bi
Jian Sun
217
1
0
22 Dec 2024
Rethinking Model Redundancy for Low-light Image Enhancement
Tong Li
Lizhi Wang
Hansen Feng
Lin Zhu
Wanxuan Lu
Hua Huang
274
0
0
21 Dec 2024
Holistic Adversarially Robust Pruning
International Conference on Learning Representations (ICLR), 2024
Qi Zhao
Christian Wressnegger
205
13
0
19 Dec 2024
RemoteTrimmer: Adaptive Structural Pruning for Remote Sensing Image Classification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Guanwenjie Zou
Liang Yao
Fan Liu
Chuanyi Zhang
Xin Li
Ning Chen
Shengxiang Xu
Jun Zhou
236
2
0
17 Dec 2024
Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li
Hulya Seferoglu
172
2
0
16 Dec 2024
Designing Semi-Structured Pruning of Graph Convolutional Networks for Skeleton-based Recognition
Hichem Sahbi
CVBM
173
0
0
16 Dec 2024
MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference
International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (ICPSISA), 2024
Parsa Ghazvinian
Robert Podschwadt
Prajwal Panzade
Mohammad H. Rafiei
Daniel Takabi
205
0
0
10 Dec 2024
TT-MPD: Test Time Model Pruning and Distillation
Haihang Wu
Wei Wang
T. Malepathirana
Sachith Seneviratne
D. Oetomo
Saman K. Halgamuge
258
0
0
10 Dec 2024
DEX: Data Channel Extension for Efficient CNN Inference on Tiny AI Accelerators
Neural Information Processing Systems (NeurIPS), 2024
Taesik Gong
F. Kawsar
Chulhong Min
236
4
0
09 Dec 2024
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
Sokratis Nikolaidis
Stylianos I. Venieris
I. Venieris
224
0
0
05 Dec 2024
Quantized and Interpretable Learning Scheme for Deep Neural Networks in Classification Task
Conference Information and Communication Technology (ICT), 2024
Alireza Maleki
Mahsa Lavaei
Mohsen Bagheritabar
Salar Beigzad
Zahra Abadi
MQ
208
2
0
05 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
224
4
0
03 Dec 2024
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
IEEE Internet of Things Journal (IEEE IoT J.), 2024
Yuzhan Wang
Sicong Liu
Bin Guo
Boqi Zhang
Ke Ma
Yasan Ding
Hao Luo
Yao Li
Zhiwen Yu
279
7
0
01 Dec 2024
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
304
2
0
28 Nov 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma
Zhenliang Ni
Xinghao Chen
Mamba
314
12
0
26 Nov 2024
Previous
1
2
3
...
5
6
7
...
71
72
73
Next