ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXiv (abs)PDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,628 papers shown
Title
Quality Scalable Quantization Methodology for Deep Learning on Edge
Quality Scalable Quantization Methodology for Deep Learning on Edge
S. Khaliq
Rehan Hafiz
MQ
140
2
0
15 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLMMQ
214
9
0
15 Jul 2024
Optimization of DNN-based speaker verification model through efficient
  quantization technique
Optimization of DNN-based speaker verification model through efficient quantization technique
Yeona Hong
Woo-Jin Chung
Hong-Goo Kang
MQ
131
1
0
12 Jul 2024
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network
  Acceleration
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration
Febin P. Sunny
Amin Shafiee
Abhishek Balasubramaniam
Mahdi Nikdast
S. Pasricha
150
2
0
11 Jul 2024
The Misclassification Likelihood Matrix: Some Classes Are More Likely To
  Be Misclassified Than Others
The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others
Daniel Sikar
Artur Garcez
Robin Bloomfield
Tillman Weyde
Kaleem Peeroo
Naman Singh
Maeve Hutchinson
Dany Laksono
Mirela Reljan-Delaney
304
2
0
10 Jul 2024
DεpS: Delayed ε-Shrinking for Faster Once-For-All
  Training
DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
Aditya Annavajjala
Alind Khare
Animesh Agrawal
Igor Fedorov
Hugo Latapie
Myungjin Lee
Alexey Tumanov
CLL
164
1
0
08 Jul 2024
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Exposing Privacy Gaps: Membership Inference Attack on Preference Data for LLM Alignment
Qizhang Feng
Siva Rajesh Kasa
Santhosh Kumar Kasa
Hyokun Yun
C. Teo
S. Bodapati
281
15
0
08 Jul 2024
Topological Persistence Guided Knowledge Distillation for Wearable
  Sensor Data
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data
Eun Som Jeon
Hongjun Choi
A. Shukla
Yuan Wang
Hyunglae Lee
M. Buman
Pavan Turaga
163
4
0
07 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
188
4
0
05 Jul 2024
Isomorphic Pruning for Vision Models
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLMViT
196
18
0
05 Jul 2024
ISQuant: apply squant to the real deployment
ISQuant: apply squant to the real deployment
Dezan Zhao
MQ
144
0
0
05 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
179
12
0
05 Jul 2024
Timestep-Aware Correction for Quantized Diffusion Models
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe Yao
Feng Tian
Jun Chen
Haonan Lin
Guang Dai
Yong Liu
Jingdong Wang
DiffMMQ
216
11
0
04 Jul 2024
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Protecting Deep Learning Model Copyrights with Adversarial Example-Free Reuse Detection
Xiaokun Luan
Xiyue Zhang
Jingyi Wang
Meng Sun
AAML
251
1
0
04 Jul 2024
Fisher-aware Quantization for DETR Detectors with Critical-category
  Objectives
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives
Huanrui Yang
Yafeng Huang
Zhen Dong
Denis A. Gudovskiy
Tomoyuki Okuno
Yohei Nakata
Yuan Du
Kurt Keutzer
Shanghang Zhang
MQ
261
1
0
03 Jul 2024
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid
  Computation
ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation
Yipin Guo
Zihao Li
Yilin Lang
Qinyuan Ren
209
0
0
03 Jul 2024
Efficient DNN-Powered Software with Fair Sparse Models
Efficient DNN-Powered Software with Fair Sparse Models
Xuanqi Gao
Weipeng Jiang
Juan Zhai
Shiqing Ma
Xiaoyu Zhang
Chao Shen
185
0
0
03 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Ruibing Jin
Min Wu
Xiaoli Li
Weisi Lin
ViTVLM
609
16
0
02 Jul 2024
A Comprehensive Survey on Diffusion Models and Their Applications
A Comprehensive Survey on Diffusion Models and Their Applications
M. Ahsan
S. Raman
Yingtao Liu
Zahed Siddique
MedImDiffM
346
3
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
223
10
0
01 Jul 2024
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs
Quanming Yao
Yongqi Zhang
Yaqing Wang
Nan Yin
James Kwok
Qiang Yang
231
0
0
29 Jun 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
190
0
0
29 Jun 2024
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
SCOPE: Stochastic Cartographic Occupancy Prediction Engine for Uncertainty-Aware Dynamic Navigation
Zhanteng Xie
P. Dames
464
3
0
28 Jun 2024
Energy-Efficient Channel Decoding for Wireless Federated Learning:
  Convergence Analysis and Adaptive Design
Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design
Linping Qu
Yuyi Mao
Shenghui Song
Chi-Ying Tsui
241
2
0
26 Jun 2024
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink
  and Downlink Adaptive Quantization
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization
Linping Qu
Shenghui Song
Chi-Ying Tsui
MQFedML
155
5
0
26 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Cheng Wang
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
306
50
0
25 Jun 2024
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and
  Optimizing the Right Coordinate Blocks
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
A. Ramesh
Vignesh Ganapathiraman
I. Laradji
Mark Schmidt
191
8
0
25 Jun 2024
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer
  Analysis
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis
Hongkang Li
Meng Wang
Shuai Zhang
Sijia Liu
Pin-Yu Chen
240
8
0
24 Jun 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
397
26
0
24 Jun 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhan Qin
Han Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
293
9
0
24 Jun 2024
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A
  Measurement Study
Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study
Zhe Wang
Yifei Zhu
154
2
0
23 Jun 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
Mingshu Zhao
Yi Luo
Yong Ouyang
327
6
0
23 Jun 2024
Failure-Resilient Distributed Inference with Model Compression over
  Heterogeneous Edge Devices
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
Li Wang
Liang Li
Lianming Xu
Xian Peng
Aiguo Fei
107
4
0
20 Jun 2024
Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices
Accelerating Depthwise Separable Convolutions on Ultra-Low-Power Devices
Francesco Daghero
Luca Bompani
Massimo Poncino
Enrico Macii
Daniele Jahier Pagliari
BDL
152
0
0
18 Jun 2024
Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with
  Latency Constraint
Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint
Xinglong Sun
Barath Lakshmanan
Maying Shen
Shiyi Lan
Jingde Chen
Jose Alvarez
VLM
254
4
0
17 Jun 2024
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint
  Shrinking
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li
Xinghao Chen
Han Shu
Yehui Tang
Yunhe Wang
MQ
198
7
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language
  Models via Cycle Block Gradient Descent
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
201
2
0
17 Jun 2024
EncCluster: Scalable Functional Encryption in Federated Learning through
  Weight Clustering and Probabilistic Filters
EncCluster: Scalable Functional Encryption in Federated Learning through Weight Clustering and Probabilistic Filters
Vasileios Tsouvalas
Samaneh Mohammadi
Ali Balador
T. Ozcelebi
Francesco Flammini
N. Meratnia
FedML
140
1
0
13 Jun 2024
DIET: Customized Slimming for Incompatible Networks in Sequential
  Recommendation
DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation
Kairui Fu
Shengyu Zhang
Zheqi Lv
Jingyuan Chen
Jiwei Li
188
7
0
13 Jun 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
196
30
0
12 Jun 2024
Teaching with Uncertainty: Unleashing the Potential of Knowledge
  Distillation in Object Detection
Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection
Junfei Yi
Jianxu Mao
Tengfei Liu
Mingjie Li
Hanyu Gu
Hui Zhang
Xiaojun Chang
Yaonan Wang
238
5
0
11 Jun 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
192
23
0
10 Jun 2024
Efficient Neural Compression with Inference-time Decoding
Efficient Neural Compression with Inference-time Decoding
Clément Metz
Olivier Bichler
Antoine Dupret
MQ
137
0
0
10 Jun 2024
Evaluating Zero-Shot Long-Context LLM Compression
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
241
0
0
10 Jun 2024
Towards Lightweight Speaker Verification via Adaptive Neural Network
  Quantization
Towards Lightweight Speaker Verification via Adaptive Neural Network QuantizationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024
Bei Liu
Haoyu Wang
Yanmin Qian
MQ
344
3
0
08 Jun 2024
Optimal Recurrent Network Topologies for Dynamical Systems
  Reconstruction
Optimal Recurrent Network Topologies for Dynamical Systems ReconstructionInternational Conference on Machine Learning (ICML), 2024
Christoph Jürgen Hemmer
Manuel Brenner
Florian Hess
Daniel Durstewitz
222
5
0
07 Jun 2024
Navigating Efficiency in MobileViT through Gaussian Process on Global
  Architecture Factors
Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors
Ke Meng
Kai Chen
185
1
0
07 Jun 2024
How Far Can We Compress Instant-NGP-Based NeRF?
How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen
Qianyi Wu
Mehrtash Harandi
Jianfei Cai
208
29
0
06 Jun 2024
ReDistill: Residual Encoded Distillation for Peak Memory Reduction of CNNs
ReDistill: Residual Encoded Distillation for Peak Memory Reduction of CNNs
Fang Chen
Gourav Datta
Mujahid Al Rafi
Hyeran Jeon
Meng Tang
573
1
0
06 Jun 2024
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large
  Language Models
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models
Peijie Dong
Lujun Li
Zhenheng Tang
Xiang Liu
Xinglin Pan
Qiang-qiang Wang
Xiaowen Chu
277
50
0
05 Jun 2024
Previous
123...8910...717273
Next