Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.00149
Cited By
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,434 papers shown
Title
Sample-aware Adaptive Structured Pruning for Large Language Models
Jun Kong
Xinge Ma
Jin Wang
Xuejie Zhang
45
0
0
08 Mar 2025
Empowering Edge Intelligence: A Comprehensive Survey on On-Device AI Models
Xubin Wang
Zhiqing Tang
Jianxiong Guo
Tianhui Meng
Chenhao Wang
Tian-sheng Wang
Weijia Jia
50
0
0
08 Mar 2025
Personalized Federated Fine-tuning for Heterogeneous Data: An Automatic Rank Learning Approach via Two-Level LoRA
Jie Hao
Yuman Wu
Ali Payani
Myungjin Lee
Mingrui Liu
37
1
0
05 Mar 2025
FairSense-AI: Responsible AI Meets Sustainability
Shaina Raza
Mukund Sayeeganesh Chettiar
Matin Yousefabadi
Tahniat Khan
Marcelo Lotif
40
0
0
04 Mar 2025
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs
Yi-Lin Sung
Prateek Yadav
Jialu Li
Jaehong Yoon
Mohit Bansal
MQ
52
1
0
03 Mar 2025
Eau De
Q
Q
Q
-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning
Théo Vincent
Tim Lukas Faust
Yogesh Tripathi
Jan Peters
Carlo DÉramo
37
0
0
03 Mar 2025
Mamba base PKD for efficient knowledge compression
José Medina
Amnir Hadachi
Paul Honeine
Abdelaziz Bensrhair
Mamba
64
0
0
03 Mar 2025
Privacy-preserving Machine Learning in Internet of Vehicle Applications: Fundamentals, Recent Advances, and Future Direction
Nazmul Islam
Mohammad Zulkernine
40
0
0
03 Mar 2025
Split Adaptation for Pre-trained Vision Transformers
Lixu Wang
Bingqi Shang
Y. Li
Payal Mohapatra
Wei Dong
Xiao-Xu Wang
Qi Zhu
ViT
43
0
0
01 Mar 2025
AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application
Dinesh Jackson Samuel
Inna Skarga-Bandurova
David Sikolia
Muhammad Awais
50
0
0
28 Feb 2025
MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning
Liang Li
Xingke Yang
Wen Wu
Hao Wang
Tomoaki Ohtsuki
Xin Fu
M. Pan
Xuemin Shen
31
1
0
27 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
Tianhuang Su
Z. Yang
MQ
65
0
0
26 Feb 2025
Mixtraining: A Better Trade-Off Between Compute and Performance
Zexin Li
Jiancheng Zhang
Yufei Li
Yinglun Zhu
Cong Liu
46
0
0
26 Feb 2025
More for Keys, Less for Values: Adaptive KV Cache Quantization
Mohsen Hariri
Lam Nguyen
Sixu Chen
Shaochen Zhong
Qifan Wang
Xia Hu
Xiaotian Han
V. Chaudhary
MQ
38
0
0
24 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
28
0
0
24 Feb 2025
Machine learning and high dimensional vector search
Matthijs Douze
63
0
0
24 Feb 2025
When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models
Weilan Wang
Yu Mao
Dongdong Tang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
62
1
0
24 Feb 2025
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Bolin Chen
Hanwei Zhu
Shanzhi Yin
Lingyu Zhu
Jie Chen
Ru-Ling Liao
Shiqi Wang
Yan Ye
52
1
0
24 Feb 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Bernard Ghanem
Yibo Yang
Yujie Zhong
Ming-Hsuan Yang
63
0
0
24 Feb 2025
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu
David Aponte
Colby R. Banbury
Daniel P. Robinson
Tianyu Ding
K. Koishida
Ilya Zharkov
Tianyi Chen
MQ
59
1
0
23 Feb 2025
Verification of Bit-Flip Attacks against Quantized Neural Networks
Yedi Zhang
Lei Huang
Pengfei Gao
Fu Song
Jun Sun
Jin Song Dong
AAML
47
0
0
22 Feb 2025
FedSpaLLM: Federated Pruning of Large Language Models
Guangji Bai
Yijiang Li
Zilinghan Li
Liang Zhao
Kibaek Kim
FedML
60
4
0
20 Feb 2025
A General Error-Theoretical Analysis Framework for Constructing Compression Strategies
Boyang Zhang
Daning Cheng
Yunquan Zhang
Meiqi Tu
Fangmin Liu
Jiake Tian
31
1
0
19 Feb 2025
GPU Memory Usage Optimization for Backward Propagation in Deep Network Training
Ding-Yong Hong
Tzu-Hsien Tsai
Ning Wang
Pangfeng Liu
Jan-Jan Wu
39
0
0
18 Feb 2025
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
Minxuan Lv
Zhenpeng Su
Leiyu Pan
Yizhe Xiong
Zijia Lin
...
Guiguang Ding
Cheng Luo
Di Zhang
Kun Gai
Songlin Hu
MoE
39
0
0
18 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
62
3
0
11 Feb 2025
EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models
Xingrun Xing
Zheng Liu
Shitao Xiao
Boyan Gao
Yiming Liang
Wanpeng Zhang
Haokun Lin
Guoqi Li
Jiajun Zhang
LRM
56
1
0
10 Feb 2025
Kolmogorov-Arnold Fourier Networks
Jusheng Zhang
Yijia Fan
Kaitong Cai
Keze Wang
63
0
0
09 Feb 2025
BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
Reena Elangovan
Charbel Sakr
A. Raghunathan
Brucek Khailany
MQ
48
1
0
07 Feb 2025
Advancing Weight and Channel Sparsification with Enhanced Saliency
Xinglong Sun
Maying Shen
Hongxu Yin
Lei Mao
Pavlo Molchanov
Jose M. Alvarez
46
1
0
05 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
X. Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
54
0
0
03 Feb 2025
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
80
1
0
02 Feb 2025
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected
Yingtao Zhang
Jialin Zhao
Wenjing Wu
Ziheng Liao
Umberto Michieli
C. Cannistraci
48
0
0
31 Jan 2025
DCentNet: Decentralized Multistage Biomedical Signal Classification using Early Exits
Xiaolin Li
Binhua Huang
B. Cardiff
Deepu John
41
0
0
31 Jan 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
76
5
0
28 Jan 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
56
1
0
20 Jan 2025
MOGNET: A Mux-residual quantized Network leveraging Online-Generated weights
Van Thien Nguyen
William Guicquero
Gilles Sicard
MQ
72
1
0
17 Jan 2025
Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search
Daniel de Souza Severo
Giuseppe Ottaviano
Matthew Muckley
Karen Ullrich
Matthijs Douze
MQ
46
0
0
16 Jan 2025
Histogram-Equalized Quantization for logic-gated Residual Neural Networks
Van Thien Nguyen
William Guicquero
Gilles Sicard
MQ
38
1
0
10 Jan 2025
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation
Hongyu Chen
Weiming Zeng
C. L. P. Chen
Luhui Cai
Fei-Yue Wang
...
Wei Zhang
Y. Li
Hongjie Yan
W. Siok
Nizhuan Wang
34
1
0
08 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
64
17
0
08 Jan 2025
Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies
Xubin Wang
Weijia Jia
34
0
0
08 Jan 2025
PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization
Assaf Lahiany
Yehudit Aperstein
31
4
0
07 Jan 2025
A Novel Structure-Agnostic Multi-Objective Approach for Weight-Sharing Compression in Deep Neural Networks
Rasa Khosrowshahli
Shahryar Rahnamayan
Beatrice Ombuki-Berman
MQ
28
0
0
06 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Z. Wang
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQ
LRM
44
3
0
06 Jan 2025
Pruning-based Data Selection and Network Fusion for Efficient Deep Learning
Humaira Kousar
Hasnain Irshad Bhatti
Jaekyun Moon
32
0
0
03 Jan 2025
SlimGPT: Layer-wise Structured Pruning for Large Language Models
Gui Ling
Ziyang Wang
Yuliang Yan
Qingwen Liu
28
2
0
24 Dec 2024
AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning
Lixian Jing
Jianpeng Qi
Junyu Dong
Yanwei Yu
3DPC
AI4CE
39
0
0
24 Dec 2024
GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference
Chao Zeng
Songwei Liu
Shu Yang
Fangmin Chen
Xing Mei
Lean Fu
MQ
42
0
0
23 Dec 2024
Lightweight Design and Optimization methods for DCNNs: Progress and Futures
Hanhua Long
Wenbin Bi
Jian Sun
75
0
0
22 Dec 2024
Previous
1
2
3
4
5
...
67
68
69
Next