ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding
v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXiv (abs)PDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,628 papers shown
Title
Can pruning make Large Language Models more efficient?
Can pruning make Large Language Models more efficient?
Sia Gholami
Marwan Omar
269
19
0
06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k
  Mixture-of-Experts Conversion
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionNeural Information Processing Systems (NeurIPS), 2023
Filip Szatkowski
Eric Elmoznino
Younesse Kaddar
Simone Scardapane
MoE
238
12
0
06 Oct 2023
Quantized Transformer Language Model Implementations on Edge Devices
Quantized Transformer Language Model Implementations on Edge DevicesInternational Conference on Machine Learning and Applications (ICMLA), 2023
Mohammad Wali Ur Rahman
Murad Mehrab Abrar
Hunter Gibbons Copening
Salim Hariri
Sicong Shao
Pratik Satam
Soheil Salehi
MQ
153
24
0
06 Oct 2023
Denoising Diffusion Step-aware Models
Denoising Diffusion Step-aware ModelsInternational Conference on Learning Representations (ICLR), 2023
Shuai Yang
Yukang Chen
Luozhou Wang
Shu Liu
Ying-Cong Chen
DiffM
353
22
0
05 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for
  Transformer Layers
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer LayersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiming Wang
Jinyu Li
170
11
0
03 Oct 2023
Feather: An Elegant Solution to Effective DNN Sparsification
Feather: An Elegant Solution to Effective DNN SparsificationBritish Machine Vision Conference (BMVC), 2023
Athanasios Glentis Georgoulakis
George Retsinas
Petros Maragos
196
1
0
03 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor CoresInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
153
30
0
03 Oct 2023
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model TrainingInternational Conference on Learning Representations (ICLR), 2023
Chenyi Zi
Yimeng Zhang
Jinghan Jia
James Diffenderfer
Jiancheng Liu
Konstantinos Parasyris
Yihua Zhang
Zheng Zhang
B. Kailkhura
Sijia Liu
590
72
0
03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus
Rickard Brannvall
Andrei Stoian
132
0
0
03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never Simple
Compressing LLMs: The Truth is Rarely Pure and Never SimpleInternational Conference on Learning Representations (ICLR), 2023
Ajay Jaiswal
Zhe Gan
Xianzhi Du
Bowen Zhang
Zinan Lin
Yinfei Yang
MQ
254
60
0
02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyInternational Conference on Learning Representations (ICLR), 2023
Pingzhi Li
Zhenyu Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
230
73
0
02 Oct 2023
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral
  Fingerprinting and Secure Delegation
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure DelegationPervasive and Mobile Computing (PMC), 2023
Marco Arazzi
S. Nicolazzo
Antonino Nocera
167
13
0
02 Oct 2023
ECNR: Efficient Compressive Neural Representation of Time-Varying
  Volumetric Datasets
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric DatasetsIEEE Pacific Visualization Symposium (PacificVis), 2023
Kaiyuan Tang
Chaoli Wang
233
19
0
02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with
  Practical Implications
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
Duc Hoang
Minsik Cho
Thomas Merth
Mohammad Rastegari
Zhangyang Wang
KELMCLL
235
5
0
02 Oct 2023
SINF: Semantic Neural Network Inference with Semantic Subgraphs
SINF: Semantic Neural Network Inference with Semantic Subgraphs
Sazzad Sayyed
Jonathan D. Ashdown
215
0
0
02 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for
  Efficient Neural Network Inference using SIMD Architectures on CPUs
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsInternational Conference on Compiler Construction (CC), 2023
Cyrus Zhou
Zack Hassman
Ruize Xu
Dhirpal Shah
Vaughn Richard
Yanjing Li
433
5
0
01 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on
  Habana Gaudi Processors
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
125
7
0
29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMsInternational Conference on Machine Learning (ICML), 2023
Lu Yin
Ajay Jaiswal
Shiwei Liu
Souvik Kundu
Zinan Lin
341
7
0
29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for
  Mobile Devices
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile DevicesIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Lehao Wang
Zhiwen Yu
Haoyi Yu
Sicong Liu
Yaxiong Xie
Bin Guo
Yunxin Liu
192
6
0
27 Sep 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A
  survey
Enabling Resource-efficient AIoT System with Cross-level Optimization: A surveyIEEE Communications Surveys and Tutorials (COMST), 2023
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
251
35
0
27 Sep 2023
Efficient Post-training Quantization with FP8 Formats
Efficient Post-training Quantization with FP8 FormatsConference on Machine Learning and Systems (MLSys), 2023
Haihao Shen
Naveen Mellempudi
Xin He
Q. Gao
Chang‐Bao Wang
Mengni Wang
MQ
276
35
0
26 Sep 2023
Probabilistic Weight Fixing: Large-scale training of neural network
  weight uncertainties for quantization
Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantizationNeural Information Processing Systems (NeurIPS), 2023
Christopher Subia-Waud
S. Dasmahapatra
UQCVMQ
227
1
0
24 Sep 2023
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning
Hugo Tessier
Ghouti Boukli Hacene
Vincent Gripon
158
1
0
22 Sep 2023
RAI4IoE: Responsible AI for Enabling the Internet of Energy
RAI4IoE: Responsible AI for Enabling the Internet of EnergyInternational Conference on Trust, Privacy and Security in Intelligent Systems and Applications (ICPSISA), 2023
Minhui Xue
Surya Nepal
Ling Liu
Subbu Sethuvenkatraman
Xingliang Yuan
Carsten Rudolph
Ruoxi Sun
Greg Eisenhauer
243
6
0
20 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured SparsityProceedings of the VLDB Endowment (PVLDB), 2023
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
Shuaiwen Leon Song
154
21
0
19 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
216
1
0
18 Sep 2023
Training dynamic models using early exits for automatic speech
  recognition on resource-constrained devices
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
George August Wright
Umberto Cappellazzo
Salah Zaiem
Desh Raj
Lucas Ondel Yang
Daniele Falavigna
Mohamed Nabih Ali
Alessio Brutti
176
4
0
18 Sep 2023
Enhancing Quantised End-to-End ASR Models via Personalisation
Enhancing Quantised End-to-End ASR Models via PersonalisationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MQ
132
3
0
17 Sep 2023
Scaling Laws for Sparsely-Connected Foundation Models
Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
231
46
0
15 Sep 2023
Accelerating Deep Neural Networks via Semi-Structured Activation
  Sparsity
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Matteo Grimaldi
Darshan C. Ganji
Ivan Lazarevich
Sudhakar Sah
168
12
0
12 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in
  Remote Sensing
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote SensingIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Clifford Broni-Bediako
Junshi Xia
Xiangwei Zhu
245
15
0
12 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private
  Inference
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Kiwan Maeng
G. E. Suh
153
4
0
09 Sep 2023
Sparse Federated Training of Object Detection in the Internet of
  Vehicles
Sparse Federated Training of Object Detection in the Internet of Vehicles
Luping Rao
Chuan Ma
Ming Ding
Yuwen Qian
Lu Zhou
Yanfeng Guo
84
3
0
07 Sep 2023
Bandwidth-efficient Inference for Neural Image Compression
Bandwidth-efficient Inference for Neural Image CompressionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shanzhi Yin
Tongda Xu
Yongsheng Liang
Yuanyuan Wang
Yanghao Li
Yan Wang
Jingjing Liu
143
1
0
06 Sep 2023
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in
  Differential Privacy with Optimal Gaussian Noise and Application to Deep
  Learning
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep LearningConference on Computer and Communications Security (CCS), 2023
Hanshen Xiao
Jun Wan
Srini Devadas
238
14
0
06 Sep 2023
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction
  Microphones for In-Ear Sensing Platforms
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing PlatformsInternational Conference on Internet-of-Things Design and Implementation (IoTDI), 2023
Philipp Schilk
Niccolò Polvani
Andrea Ronco
Milos Cernak
Michele Magno
185
13
0
05 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks
Wei Huang
Haotong Qin
Yangdong Liu
Jingzhuo Liang
Yifu Ding
Ying Li
Xianglong Liu
MQ
363
2
0
05 Sep 2023
Efficient Defense Against Model Stealing Attacks on Convolutional Neural
  Networks
Efficient Defense Against Model Stealing Attacks on Convolutional Neural NetworksInternational Conference on Machine Learning and Applications (ICMLA), 2023
Kacem Khaled
Mouna Dhaouadi
F. Magalhães
Gabriela Nicolescu
AAML
102
2
0
04 Sep 2023
On the fly Deep Neural Network Optimization Control for Low-Power
  Computer Vision
On the fly Deep Neural Network Optimization Control for Low-Power Computer VisionIEEE International Performance, Computing, and Communications Conference (IPCCC), 2023
Ishmeet Kaur
Adwaita Janardhan Jadhav
100
0
0
04 Sep 2023
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency
  Transformation
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation
Nastaran Darabi
Maeesha Binte Hashem
Hongyi Pan
Ahmet Cetin
Wilfred Gomes
A. R. Trivedi
110
6
0
04 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Saturn: An Optimized Data System for Large Model Deep Learning WorkloadsProceedings of the VLDB Endowment (PVLDB), 2023
Kabir Nagrecha
Arun Kumar
304
8
0
03 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large
  Language Models
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language ModelsIEEE computer architecture letters (CAL), 2023
Minsik Cho
Keivan Alizadeh Vahid
Qichen Fu
Saurabh N. Adya
C. C. D. Mundo
Mohammad Rastegari
Devang Naik
Peter Zatloukal
MQ
204
9
0
02 Sep 2023
Proof of Deep Learning: Approaches, Challenges, and Future Directions
Proof of Deep Learning: Approaches, Challenges, and Future Directions
Mahmoud Salhab
Khaleel W. Mershad
139
3
0
31 Aug 2023
Latency-aware Unified Dynamic Networks for Efficient Image Recognition
Latency-aware Unified Dynamic Networks for Efficient Image RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yizeng Han
Zeyu Liu
Zhihang Yuan
Yifan Pu
Chaofei Wang
Shiji Song
Gao Huang
425
33
0
30 Aug 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks
  and Resource Constraints
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints
Wenxing Xu
Yuanchun Li
Jiacheng Liu
Yiyou Sun
Zhengyang Cao
Shouqing Yang
Hao Wen
Yunxin Liu
222
2
0
29 Aug 2023
Uncovering the Hidden Cost of Model Compression
Uncovering the Hidden Cost of Model Compression
Diganta Misra
Muawiz Chaudhary
Agam Goyal
Bharat Runwal
Pin-Yu Chen
VLM
261
3
0
29 Aug 2023
Low-bit Quantization for Deep Graph Neural Networks with
  Smoothness-aware Message Propagation
Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message PropagationInternational Conference on Information and Knowledge Management (CIKM), 2023
Shuang Wang
B. Eravcı
Rustam Guliyev
Hakan Ferhatosmanoglu
GNNMQ
155
10
0
29 Aug 2023
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition
Maestro: Uncovering Low-Rank Structures via Trainable DecompositionInternational Conference on Machine Learning (ICML), 2023
Samuel Horváth
Stefanos Laskaridis
Shashank Rajput
Hongyi Wang
BDL
318
9
0
28 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
282
32
0
27 Aug 2023
Homological Convolutional Neural Networks
Homological Convolutional Neural Networks
Antonio Briola
Yuanrong Wang
Silvia Bartolucci
T. Aste
LMTD
219
7
0
26 Aug 2023
Previous
123...141516...717273
Next