ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00149
  4. Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained
  Quantization and Huffman Coding

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015
Song Han
Huizi Mao
W. Dally
    3DGS
ArXivPDFHTML

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,434 papers shown
Title
QoS-Nets: Adaptive Approximate Neural Network Inference
QoS-Nets: Adaptive Approximate Neural Network Inference
E. Trommer
Bernd Waschneck
Akash Kumar
18
0
0
10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
K. Wense
Lawrence E Hunter
Matt Jones
MoE
43
0
0
10 Oct 2024
Compressing Large Language Models with Automated Sub-Network Search
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Frank Hutter
Aaron Klein
LRM
38
0
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
27
5
0
08 Oct 2024
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR
  Through Trajectory Coarse Discretization and Pre-training
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
Junxiao Shen
Khadija Khaldi
Enmin Zhou
Hemant Bhaskar Surale
Amy Karlson
11
0
0
08 Oct 2024
Addition is All You Need for Energy-efficient Language Models
Addition is All You Need for Energy-efficient Language Models
Hongyin Luo
Wei Sun
21
2
0
01 Oct 2024
Compressing Recurrent Neural Networks for FPGA-accelerated
  Implementation in Fluorescence Lifetime Imaging
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging
Ismail Erbas
Vikas Pandey
Aporva Amarnath
Naigang Wang
Karthik Swaminathan
Stefan T. Radev
Xavier Intes
AI4CE
19
1
0
01 Oct 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
18
1
0
28 Sep 2024
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse
  Training
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training
Pihe Hu
Shaolong Li
Zhuoran Li
L. Pan
Longbo Huang
16
0
0
28 Sep 2024
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Hyeong Kyu Choi
Weijie Xu
Chi Xue
Stephanie Eckman
Chandan K. Reddy
29
1
0
27 Sep 2024
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on
  Mixed-Signal Accelerators
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Seyedarmin Azizi
Mohammad Erfan Sadeghi
M. Kamal
Massoud Pedram
17
2
0
27 Sep 2024
Efficient Arbitrary Precision Acceleration for Large Language Models on
  GPU Tensor Cores
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Shaobo Ma
Chao Fang
Haikuo Shao
Zhongfeng Wang
28
4
0
26 Sep 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Gongfan Fang
Hongxu Yin
Saurav Muralidharan
Greg Heinrich
Jeff Pool
Jan Kautz
Pavlo Molchanov
Xinchao Wang
35
3
0
26 Sep 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for
  Resource-Constrained Embedded Platforms
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
Niraj Pudasaini
Muhammad Abdullah Hanif
Muhammad Shafique
19
0
0
22 Sep 2024
CFSP: An Efficient Structured Pruning Framework for LLMs with
  Coarse-to-Fine Activation Information
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Yuxin Wang
Minghua Ma
Zekun Wang
Jingchang Chen
Huiming Fan
Liping Shan
Qing Yang
Dongliang Xu
Ming Liu
Bing Qin
24
3
0
20 Sep 2024
Green Federated Learning: A new era of Green Aware AI
Green Federated Learning: A new era of Green Aware AI
Dipanwita Thakur
Antonella Guzzo
Giancarlo Fortino
Francesco Piccialli
AI4CE
40
4
0
19 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
25
2
0
14 Sep 2024
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
Yuezhou Hu
Jun-Jie Zhu
Jianfei Chen
36
0
0
13 Sep 2024
Expediting and Elevating Large Language Model Reasoning via Hidden
  Chain-of-Thought Decoding
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding
Tianqiao Liu
Zui Chen
Zitao Liu
Mi Tian
Weiqi Luo
LRM
27
2
0
13 Sep 2024
NVRC: Neural Video Representation Compression
NVRC: Neural Video Representation Compression
Ho Man Kwan
Ge Gao
Fan Zhang
Andrew Gower
David Bull
23
11
0
11 Sep 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning
Tianyi Chen
Xiaoyi Qu
David Aponte
Colby R. Banbury
Jongwoo Ko
Tianyu Ding
Yong Ma
Vladimir Lyapunov
Ilya Zharkov
Luming Liang
73
1
0
11 Sep 2024
Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency,
  and Accuracy
Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy
Boyuan Tian
Yihan Pang
Muhammad Huzaifa
Shenlong Wang
Sarita Adve
32
1
0
06 Sep 2024
Panoptic Perception for Autonomous Driving: A Survey
Panoptic Perception for Autonomous Driving: A Survey
Yunge Li
Lanyu Xu
32
2
0
27 Aug 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
81
1
0
26 Aug 2024
Condensed Sample-Guided Model Inversion for Knowledge Distillation
Condensed Sample-Guided Model Inversion for Knowledge Distillation
Kuluhan Binici
Shivam Aggarwal
Cihan Acar
N. Pham
K. Leman
Gim Hee Lee
Tulika Mitra
41
1
0
25 Aug 2024
MPruner: Optimizing Neural Network Size with CKA-Based Mutual
  Information Pruning
MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning
Seungbeom Hu
ChanJun Park
Andrew Ferraiuolo
Sang-Ki Ko
Jinwoo Kim
Haein Song
Jieung Kim
16
1
0
24 Aug 2024
A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs
A Greedy Hierarchical Approach to Whole-Network Filter-Pruning in CNNs
Kiran Purohit
Anurag Parvathgari
Sourangshu Bhattacharya
VLM
23
0
0
22 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
69
31
0
22 Aug 2024
Domain-invariant Progressive Knowledge Distillation for UAV-based Object
  Detection
Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
Liang Yao
Fan Liu
Chuanyi Zhang
Zhiquan Ou
Ting Wu
VLM
39
4
0
21 Aug 2024
Enhancing One-shot Pruned Pre-trained Language Models through
  Sparse-Dense-Sparse Mechanism
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
Guanchen Li
Xiandong Zhao
Lian Liu
Zeping Li
Dong Li
Lu Tian
Jie He
Ashish Sirasao
E. Barsoum
VLM
32
0
0
20 Aug 2024
Diffusion Model for Planning: A Systematic Literature Review
Diffusion Model for Planning: A Systematic Literature Review
Toshihide Ubukata
Jialong Li
Kenji Tei
DiffM
MedIm
43
6
0
16 Aug 2024
An Effective Information Theoretic Framework for Channel Pruning
An Effective Information Theoretic Framework for Channel Pruning
Yihao Chen
Zefang Wang
41
2
0
14 Aug 2024
Infra-YOLO: Efficient Neural Network Structure with Model Compression
  for Real-Time Infrared Small Object Detection
Infra-YOLO: Efficient Neural Network Structure with Model Compression for Real-Time Infrared Small Object Detection
Zhonglin Chen
Anyu Geng
Jianan Jiang
Jiwu Lu
Di Wu
ObjD
24
0
0
14 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
X. Sun
48
12
0
13 Aug 2024
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models
  On Heterogeneous Microcontrollers
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers
Moritz Scherer
Luka Macan
Victor J. B. Jung
Philip Wiese
Luca Bompani
Alessio Burrello
Francesco Conti
Luca Benini
MoE
39
10
0
08 Aug 2024
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model
Mingcan Xiang
Steven Jiaxun Tang
Qizheng Yang
Hui Guan
Tongping Liu
VLM
34
0
0
07 Aug 2024
Speaker Adaptation for Quantised End-to-End ASR Models
Speaker Adaptation for Quantised End-to-End ASR Models
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
30
1
0
07 Aug 2024
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks
Jaewook Lee
Yoel Park
Seulki Lee
VLM
16
1
0
07 Aug 2024
Compress and Compare: Interactively Evaluating Efficiency and Behavior
  Across ML Model Compression Experiments
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
Angie Boggust
Venkatesh Sivaraman
Yannick Assogba
Donghao Ren
Dominik Moritz
Fred Hohman
VLM
50
3
0
06 Aug 2024
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong
Lujun Li
Dayou Du
Yuhan Chen
Zhenheng Tang
...
Wei Xue
Wenhan Luo
Qi-fei Liu
Yi-Ting Guo
Xiaowen Chu
MQ
45
4
0
03 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
29
0
0
01 Aug 2024
Pruning Large Language Models with Semi-Structural Adaptive Sparse
  Training
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang
Yuezhou Hu
Guohao Jian
Jun Zhu
Jianfei Chen
35
5
0
30 Jul 2024
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
Seungmin Yu
Xiaodie Yi
Hayun Lee
Dongkun Shin
16
1
0
30 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
40
1
0
29 Jul 2024
Parameter-Efficient Fine-Tuning via Circular Convolution
Parameter-Efficient Fine-Tuning via Circular Convolution
Aochuan Chen
Jiashun Cheng
Zijing Liu
Ziqi Gao
Fugee Tsung
Yu Li
Jia Li
56
2
0
27 Jul 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for
  LLMs Without Retraining
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
30
5
0
26 Jul 2024
Efficient Inference of Vision Instruction-Following Models with Elastic
  Cache
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu
Benlin Liu
Jiahui Wang
Yuhao Dong
Guangyi Chen
Yongming Rao
Ranjay Krishna
Jiwen Lu
VLM
40
8
0
25 Jul 2024
Accelerating the Low-Rank Decomposed Models
Accelerating the Low-Rank Decomposed Models
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
16
0
0
24 Jul 2024
Accurate and Efficient Fine-Tuning of Quantized Large Language Models
  Through Optimal Balance
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance
Ao Shen
Qiang Wang
Zhiquan Lai
Xionglve Li
Dongsheng Li
ALM
MQ
27
1
0
24 Jul 2024
Previous
12345...676869
Next