Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,625 papers shown
Title
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
IEEE Internet of Things Journal (IEEE IoT J.), 2024
Yuzhan Wang
Sicong Liu
Bin Guo
Boqi Zhang
Ke Ma
Yasan Ding
Hao Luo
Yao Li
Zhiwen Yu
279
7
0
01 Dec 2024
Is Oracle Pruning the True Oracle?
Sicheng Feng
Keda Tao
Haoyu Wang
VLM
314
2
0
28 Nov 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma
Zhenliang Ni
Xinghao Chen
Mamba
330
12
0
26 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
Zhaopeng Tu
VLM
497
1
0
21 Nov 2024
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning
Andy Li
A. Durrant
Milan Markovic
Lu Yin
Georgios Leontidis
Tianlong Chen
Lu Yin
Georgios Leontidis
349
1
0
20 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
138
0
0
15 Nov 2024
P
2
^2
2
Law: Scaling Law for Post-Training After Model Pruning
Xiaodong Chen
Yuxuan Hu
Jing Zhang
Yanling Wang
Xuefei Liu
Zeyang Zhang
Jing Zhang
192
0
0
15 Nov 2024
Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning
Lawrence Francis
Blessed Guda
Ahmed Biyabani
105
0
0
12 Nov 2024
CULL-MT: Compression Using Language and Layer pruning for Machine Translation
Pedram Rostami
M. Dousti
228
2
0
10 Nov 2024
Client Contribution Normalization for Enhanced Federated Learning
IEEE India Conference (INDICON), 2024
Mayank Kumar Kundalwal
Anurag Saraswat
Ishan Mishra
Deepak Mishra
FedML
164
0
0
10 Nov 2024
Learning Morphisms with Gauss-Newton Approximation for Growing Networks
Neal Lawton
Aram Galstyan
Greg Ver Steeg
149
0
0
07 Nov 2024
Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS Cameras
Roberto Riaño
Gorka Abad
S. Picek
A. Urbieta
AAML
334
2
0
05 Nov 2024
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Journal of Data Science (JDS), 2024
Mingxuan Zhang
Y. Sun
F. Liang
253
0
0
01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Bram Adams
Ahmed E. Hassan
VLM
346
1
0
01 Nov 2024
Mutual Information Preserving Neural Network Pruning
Charles Westphal
Stephen Hailes
Mirco Musolesi
436
3
0
31 Oct 2024
Offline Behavior Distillation
Neural Information Processing Systems (NeurIPS), 2024
Shiye Lei
Sen Zhang
Dacheng Tao
OffRL
192
2
0
30 Oct 2024
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking
International Symposium on Circuits and Systems (ISCAS), 2024
Matheus Farias
H. T. Kung
MQ
117
2
0
29 Oct 2024
Data Generation for Hardware-Friendly Post-Training Quantization
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Lior Dikstein
Ariel Lapid
Arnon Netzer
H. Habi
MQ
895
1
0
29 Oct 2024
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression
Noel Elias
H. Esfahanizadeh
Kaan Kale
S. Vishwanath
Muriel Médard
258
0
0
28 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Neural Information Processing Systems (NeurIPS), 2024
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
221
1
0
28 Oct 2024
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms
He Yu
Qingbin Liu
156
12
0
28 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments
Yuzhe Yang
Yipeng Du
Ahmad Farhan
Claudio Angione
Yue Zhao
Harry Yang
Fielding Johnston
James Buban
Patrick Colangelo
269
0
0
28 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
185
4
0
28 Oct 2024
Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
283
3
0
25 Oct 2024
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices
Chuntao Ding
Xu Cao
Jianhang Xie
Linlin Fan
Shangguang Wang
Zhichao Lu
250
11
0
22 Oct 2024
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning
Siddharth Sahu
Abdulrahman Altahhan
3DPC
MedIm
198
0
0
22 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
262
13
0
17 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
Yanyue Xie
Zhi Zhang
Ding Zhou
Cong Xie
Ziang Song
Xin Liu
Yanzhi Wang
Xue Lin
An Xu
LLMAG
194
24
0
15 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
International Symposium on Circuits and Systems (ISCAS), 2024
Matheus Farias
H. T. Kung
178
2
0
15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
Artificial Intelligence Applications and Innovations (AIAI), 2024
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
176
2
0
15 Oct 2024
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models
Zhumazhan Balapanov
Edward Magongo
Vanessa Matvei
Olivia Holmberg
Jonathan Pei
Kevin Zhu
191
2
0
14 Oct 2024
Arrhythmia Classification Using Graph Neural Networks Based on Correlation Matrix
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Seungwoo Han
293
10
0
14 Oct 2024
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
International Conference on Learning Representations (ICLR), 2024
Dingdong Yang
Yizhi Wang
Konrad Schindler
Ali Mahdavi Amiri
Hao Zhang
205
1
0
13 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving
IEEE Transactions on Mobile Computing (IEEE TMC), 2024
Pengfei Hu
Yuhang Qian
Tianyue Zheng
Ang Li
Zhe Chen
Yue Gao
Xiuzhen Cheng
Jun Luo
258
3
0
13 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
470
4
0
13 Oct 2024
Gradient-Free Training of Quantized Neural Networks
Dotan Di Castro
O. Joglekar
Dotan Di Castro
Vladimir Tchuiev
Shir Kozlovsky
Michal Moshkovitz
MQ
186
0
0
13 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
International Conference on Learning Representations (ICLR), 2024
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
415
8
0
12 Oct 2024
Neural Metamorphosis
European Conference on Computer Vision (ECCV), 2024
Xingyi Yang
Xinchao Wang
247
5
0
10 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
160
2
0
10 Oct 2024
QoS-Nets: Adaptive Approximate Neural Network Inference
E. Trommer
Bernd Waschneck
Akash Kumar
137
0
0
10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
International Conference on Learning Representations (ICLR), 2024
Sagi Shaier
Francisco Pereira
Katharina von der Wense
Lawrence E Hunter
Matt Jones
MoE
574
0
0
10 Oct 2024
Compressing Large Language Models with Automated Sub-Network Search
R. Sukthanker
B. Staffler
Katharina Eggensperger
Aaron Klein
LRM
256
0
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
IEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
169
17
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
217
1
0
08 Oct 2024
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Junxiao Shen
Khadija Khaldi
Enmin Zhou
Hemant Bhaskar Surale
Amy Karlson
158
0
0
08 Oct 2024
Addition is All You Need for Energy-efficient Language Models
Hongyin Luo
Wei Sun
106
11
0
01 Oct 2024
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging
Ismail Erbas
Vikas Pandey
Aporva Amarnath
Naigang Wang
Karthik Swaminathan
Stefan T. Radev
Xavier Intes
AI4CE
135
1
0
01 Oct 2024
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation
Neural Networks (NN), 2024
Hongyu Chen
Weiming Zeng
Chong Chen
Luhui Cai
Haiwei Yang
...
Wei Zhang
Yuchen Ren
Hongjie Yan
W. Siok
Nizhuan Wang
276
0
0
30 Sep 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Internet of Things (IoT), 2024
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
230
5
0
28 Sep 2024
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training
Neural Information Processing Systems (NeurIPS), 2024
Pihe Hu
Shaolong Li
Zhuoran Li
L. Pan
Longbo Huang
150
1
0
28 Sep 2024
Previous
1
2
3
...
6
7
8
...
71
72
73
Next