Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1510.00149
Cited By
v1
v2
v3
v4
v5 (latest)
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
1 October 2015
Song Han
Huizi Mao
W. Dally
3DGS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"
50 / 3,628 papers shown
Title
Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation
Haruka Ezoe
Kazuhiro Sato
161
5
0
25 Feb 2024
Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood
Rayen Dhahri
Alexander Immer
Bertrand Charpentier
Stephan Günnemann
Vincent Fortuin
BDL
UQCV
187
7
0
25 Feb 2024
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
Yong Liu
Zirui Zhu
Chaoyu Gong
Minhao Cheng
Cho-Jui Hsieh
Yang You
MoE
210
33
0
24 Feb 2024
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
MLT
423
31
0
23 Feb 2024
NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Dhananjay Saikumar
Blesson Varghese
207
2
0
21 Feb 2024
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models
Chenyang Song
Xu Han
Zhengyan Zhang
Shengding Hu
Xiyu Shi
...
Chen Chen
Zhiyuan Liu
Guanglin Li
Tao Yang
Maosong Sun
334
40
0
21 Feb 2024
Tiny Reinforcement Learning for Quadruped Locomotion using Decision Transformers
Orhan Eren Akgün
Néstor Cuevas
Matheus Farias
Daniel Garces
193
1
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Wanrong Zhu
KELM
VLM
430
222
0
20 Feb 2024
In value-based deep reinforcement learning, a pruned network is a good network
J. Obando-Ceron
Rameswar Panda
Pablo Samuel Castro
OffRL
446
31
0
19 Feb 2024
Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
Zongru Wu
Zhuosheng Zhang
Pengzhou Cheng
Gongshen Liu
AAML
296
9
0
19 Feb 2024
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
Zhuoming Chen
Avner May
Ruslan Svirschevski
Yuhsun Huang
Max Ryabinin
Zhihao Jia
Beidi Chen
319
67
0
19 Feb 2024
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Yihua Zhang
Pingzhi Li
Junyuan Hong
Jiaxiang Li
Yimeng Zhang
...
Wotao Yin
Mingyi Hong
Zinan Lin
Sijia Liu
Tianlong Chen
383
94
0
18 Feb 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
297
5
0
18 Feb 2024
Generalizability of Mixture of Domain-Specific Adapters from the Lens of Signed Weight Directions and its Application to Effective Model Pruning
Tuc Nguyen
Thai Le
MoMe
225
3
0
16 Feb 2024
BitDelta: Your Fine-Tune May Only Be Worth One Bit
James Liu
Guangxuan Xiao
Kai Li
Jason D. Lee
Song Han
Tri Dao
Tianle Cai
203
36
0
15 Feb 2024
HiRE: High Recall Approximate Top-
k
k
k
Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
153
6
0
14 Feb 2024
FL-NAS: Towards Fairness of NAS for Resource Constrained Devices via Large Language Models
Ruiyang Qin
Yuting Hu
Zheyu Yan
Jinjun Xiong
Ahmed Abbasi
Yiyu Shi
145
9
0
09 Feb 2024
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes
Lucio Dery
Steven Kolawole
Jean-Francois Kagey
Virginia Smith
Graham Neubig
Ameet Talwalkar
244
46
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
295
168
0
07 Feb 2024
EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss
Zhuoyang Zhang
Han Cai
Song Han
VLM
235
4
0
07 Feb 2024
L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models
Hyesung Jeon
Yulhwa Kim
Jae-Joon Kim
MQ
242
8
0
07 Feb 2024
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
Abhimanyu Bambhaniya
Amir Yazdanbakhsh
Suvinay Subramanian
Sheng-Chun Kao
Shivani Agrawal
Utku Evci
Tushar Krishna
277
23
0
07 Feb 2024
Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving
Wensheng Su
Zhenni Li
Minrui Xu
Jiawen Kang
Dusit Niyato
Shengli Xie
135
14
0
07 Feb 2024
Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons
Zhenyu Liu
Garrett Gagnon
Swagath Venkataramani
Liu Liu
AAML
209
2
0
06 Feb 2024
Single-GPU GNN Systems: Traps and Pitfalls
Yidong Gong
A. Tarafder
Saima Afrin
Pradeep Kumar
GNN
228
2
0
05 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
423
63
0
05 Feb 2024
Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation
Shuyao Wang
Yongduo Sui
Jiancan Wu
Zhi Zheng
Hui Xiong
119
23
0
05 Feb 2024
Ultrafast jet classification on FPGAs for the HL-LHC
Patrick Odagiu
Zhiqiang Que
Javier Mauricio Duarte
J. Haller
Gregor Kasieczka
...
Arpita Seksaria
S. Summers
A. Sznajder
A. Tapper
Thea Klæboe Årrestad
189
13
0
02 Feb 2024
Lightweight Pixel Difference Networks for Efficient Visual Representation Learning
Z. Su
Jiehua Zhang
Longguang Wang
Hua Zhang
Zhen Liu
M. Pietikäinen
Tianpeng Liu
248
39
0
01 Feb 2024
EPSD: Early Pruning with Self-Distillation for Efficient Model Compression
Dong Chen
Ning Liu
Yichen Zhu
Zhengping Che
Rui Ma
Fachao Zhang
Xiaofeng Mou
Yi Chang
Jian Tang
218
7
0
31 Jan 2024
Effect of Weight Quantization on Learning Models by Typical Case Analysis
Shuhei Kashiwamura
Ayaka Sakata
Masaaki Imaizumi
MQ
198
3
0
30 Jan 2024
One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware Quantization Training
Lianbo Ma
Yuee Zhou
Jianlun Ma
Guo-Ding Yu
Qing Li
MQ
163
5
0
30 Jan 2024
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget
Kun Wang
Jiani Cao
Zimu Zhou
Zhenjiang Li
160
13
0
30 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
360
294
0
30 Jan 2024
Do deep neural networks utilize the weight space efficiently?
Onur Can Koyun
B. U. Toreyin
122
0
0
26 Jan 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
International Conference on Learning Representations (ICLR), 2024
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
407
276
0
26 Jan 2024
MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
Y. Tai
An-Yeu Wu
Wu
MQ
274
9
0
26 Jan 2024
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks
International Conference on Computer Aided Verification (CAV), 2024
Haoze Wu
Omri Isac
Aleksandar Zeljić
Teruhiro Tagomori
M. Daggitt
...
Min Wu
Min Zhang
Ekaterina Komendantskaya
Guy Katz
Clark W. Barrett
328
64
0
25 Jan 2024
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Vasileios Tsouvalas
Aaqib Saeed
T. Ozcelebi
N. Meratnia
FedML
261
15
0
25 Jan 2024
Dynamic Layer Tying for Parameter-Efficient Transformers
International Conference on Learning Representations (ICLR), 2024
Tamir David Hay
Lior Wolf
138
11
0
23 Jan 2024
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
International Conference on Machine Learning (ICML), 2024
Bowen Zhao
Hannaneh Hajishirzi
Qingqing Cao
300
26
0
22 Jan 2024
Robustness to distribution shifts of compressed networks for edge devices
Lulan Shen
Ali Edalati
Brett H. Meyer
Warren Gross
James J. Clark
153
0
0
22 Jan 2024
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
Bingbing Li
Geng Yuan
Zigeng Wang
Shaoyi Huang
Hongwu Peng
Rohit Das
Wujie Wen
Hang Liu
Caiwen Ding
110
8
0
22 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Neural Networks (Neural Netw.), 2023
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
245
9
0
22 Jan 2024
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Findings (Findings), 2024
Nadav Benedek
Lior Wolf
139
7
0
20 Jan 2024
Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Adib Hasan
Ileana Rugina
Alex Wang
AAML
224
29
0
19 Jan 2024
AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
Xuanlei Zhao
Shenggan Cheng
Guangyang Lu
Jiarui Fang
Hao Zhou
Bin Jia
Ziming Liu
Yang You
MQ
224
3
0
19 Jan 2024
SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression
Ho Fung Tsoi
Vladimir Loncar
S. Dasu
Philip C. Harris
446
12
0
18 Jan 2024
DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
IEEE Conference on Computer Communications (INFOCOM), 2024
Lixiang Han
Zhen Xiao
Zhenjiang Li
237
12
0
17 Jan 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
256
5
0
16 Jan 2024
Previous
1
2
3
...
11
12
13
...
71
72
73
Next