v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015

Song Han

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,625 papers shown

Title
AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile DevicesIEEE Internet of Things Journal (IEEE IoT J.), 2024 Yuzhan Wang Sicong Liu Bin Guo Boqi Zhang Ke Ma Yasan Ding Hao Luo Yao Li Zhiwen Yu 279 7 0 01 Dec 2024
Is Oracle Pruning the True Oracle? Sicheng Feng Keda Tao Haoyu Wang VLM 314 2 0 28 Nov 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba Xiaowen Ma Zhenliang Ni Xinghao Chen Mamba 330 12 0 26 Nov 2024
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Hexuan Deng Wenxiang Jiao Xuebo Liu Min Zhang Zhaopeng Tu Zhaopeng Tu VLM 497 1 0 21 Nov 2024
Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning Andy Li A. Durrant Milan Markovic Lu Yin Georgios Leontidis Tianlong Chen Lu Yin Georgios Leontidis 349 1 0 20 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism Priyansh Bhatnagar Linfeng Wen Mingu Kang 138 0 0 15 Nov 2024
P $^2$ Law: Scaling Law for Post-Training After Model Pruning Xiaodong Chen Yuxuan Hu Jing Zhang Yanling Wang Xuefei Liu Zeyang Zhang Jing Zhang 192 0 0 15 Nov 2024
Optimizing Traffic Signal Control using High-Dimensional State Representation and Efficient Deep Reinforcement Learning Lawrence Francis Blessed Guda Ahmed Biyabani 105 0 0 12 Nov 2024
CULL-MT: Compression Using Language and Layer pruning for Machine Translation Pedram Rostami M. Dousti 228 2 0 10 Nov 2024
Client Contribution Normalization for Enhanced Federated LearningIEEE India Conference (INDICON), 2024 Mayank Kumar Kundalwal Anurag Saraswat Ishan Mishra Deepak Mishra FedML 164 0 0 10 Nov 2024
Learning Morphisms with Gauss-Newton Approximation for Growing Networks Neal Lawton Aram Galstyan Greg Ver Steeg 149 0 0 07 Nov 2024
Flashy Backdoor: Real-world Environment Backdoor Attack on SNNs with DVS Cameras Roberto Riaño Gorka Abad S. Picek A. Urbieta AAML 334 2 0 05 Nov 2024
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian PriorJournal of Data Science (JDS), 2024 Mingxuan Zhang Y. Sun F. Liang 253 0 0 01 Nov 2024
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance Jaskirat Singh Bram Adams Ahmed E. Hassan VLM 346 1 0 01 Nov 2024
Mutual Information Preserving Neural Network Pruning Charles Westphal Stephen Hailes Mirco Musolesi 436 3 0 31 Oct 2024
Offline Behavior DistillationNeural Information Processing Systems (NeurIPS), 2024 Shiye Lei Sen Zhang Dacheng Tao OffRL 192 2 0 30 Oct 2024
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit StuckingInternational Symposium on Circuits and Systems (ISCAS), 2024 Matheus Farias H. T. Kung MQ 117 2 0 29 Oct 2024
Data Generation for Hardware-Friendly Post-Training QuantizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024 Lior Dikstein Ariel Lapid Arnon Netzer H. Habi MQ 895 1 0 29 Oct 2024
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression Noel Elias H. Esfahanizadeh Kaan Kale S. Vishwanath Muriel Médard 258 0 0 28 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network InferenceNeural Information Processing Systems (NeurIPS), 2024 Changwoo Lee Soo Min Kwon Qing Qu Hun-Seok Kim 221 1 0 28 Oct 2024
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms He Yu Qingbin Liu 156 12 0 28 Oct 2024
Meta-Learning for Speeding Up Large Model Inference in Decentralized Environments Yuzhe Yang Yipeng Du Ahmad Farhan Claudio Angione Yue Zhao Harry Yang Fielding Johnston James Buban Patrick Colangelo 269 0 0 28 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks Yongchang Hao Yanshuai Cao Lili Mou MQ 185 4 0 28 Oct 2024
Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking Tuowei Wang Ruwen Fan Minxing Huang Zixu Hao Kun Li Ting Cao Youyou Lu Yaoxue Zhang Ju Ren 283 3 0 25 Oct 2024
LoRA-C: Parameter-Efficient Fine-Tuning of Robust CNN for IoT Devices Chuntao Ding Xu Cao Jianhang Xie Linlin Fan Shangguang Wang Zhichao Lu 250 11 0 22 Oct 2024
Mitigating Vanishing Activations in Deep CapsNets Using Channel Pruning Siddharth Sahu Abdulrahman Altahhan 3DPC MedIm 198 0 0 22 Oct 2024
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Guhao Feng Kai-Bo Yang Yuntian Gu Xinyue Ai Shengjie Luo Jiacheng Sun Di He Hao Sun Liwei Wang LRM 262 13 0 17 Oct 2024
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router Yanyue Xie Zhi Zhang Ding Zhou Cong Xie Ziang Song Xin Liu Yanzhi Wang Xue Lin An Xu LLMAG 194 24 0 15 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory CrossbarsInternational Symposium on Circuits and Systems (ISCAS), 2024 Matheus Farias H. T. Kung 178 2 0 15 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource EnvironmentsArtificial Intelligence Applications and Innovations (AIAI), 2024 Syed Abdul Gaffar Shakhadri Kruthika KR Rakshit Aralimatti VLM 176 2 0 15 Oct 2024
QIANets: Quantum-Integrated Adaptive Networks for Reduced Latency and Improved Inference Times in CNN Models Zhumazhan Balapanov Edward Magongo Vanessa Matvei Olivia Holmberg Jonathan Pei Kevin Zhu 191 2 0 14 Oct 2024
Arrhythmia Classification Using Graph Neural Networks Based on Correlation MatrixIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024 Seungwoo Han 293 10 0 14 Oct 2024
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D GenerationInternational Conference on Learning Representations (ICLR), 2024 Dingdong Yang Yizhi Wang Konrad Schindler Ali Mahdavi Amiri Hao Zhang 205 1 0 13 Oct 2024
t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous DrivingIEEE Transactions on Mobile Computing (IEEE TMC), 2024 Pengfei Hu Yuhang Qian Tianyue Zheng Ang Li Zhe Chen Yue Gao Xiuzhen Cheng Jun Luo 258 3 0 13 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models Vithursan Thangarasa Ganesh Venkatesh Mike Lasby Nish Sinnadurai Sean Lie SyDa 470 4 0 13 Oct 2024
Gradient-Free Training of Quantized Neural Networks Dotan Di Castro O. Joglekar Dotan Di Castro Vladimir Tchuiev Shir Kozlovsky Michal Moshkovitz MQ 186 0 0 13 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (ICLR), 2024 Wenlong Deng Yize Zhao V. Vakilian Minghui Chen Xiaoxiao Li Christos Thrampoulidis 415 8 0 12 Oct 2024
Neural MetamorphosisEuropean Conference on Computer Vision (ECCV), 2024 Xingyi Yang Xinchao Wang 247 5 0 10 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 Adriana Fernandez-Lopez Shiwei Liu L. Yin Stavros Petridis Maja Pantic 160 2 0 10 Oct 2024
QoS-Nets: Adaptive Approximate Neural Network Inference E. Trommer Bernd Waschneck Akash Kumar 137 0 0 10 Oct 2024
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed RoutingInternational Conference on Learning Representations (ICLR), 2024 Sagi Shaier Francisco Pereira Katharina von der Wense Lawrence E Hunter Matt Jones MoE 574 0 0 10 Oct 2024
Compressing Large Language Models with Automated Sub-Network Search R. Sukthanker B. Staffler Katharina Eggensperger Aaron Klein LRM 256 0 0 09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024 Cong Guo Feng Cheng Zhixu Du James Kiessling Jonathan Ku ... Qilin Zheng Guanglei Zhou Hai Li-Wei Li Yiran Chen 169 17 0 08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See Phu Pham Phu Pham Kun Wan Yu-Jhe Li Zeliang Zhang Daniel Miranda Ajinkya Kale Ajinkya Kale Chenliang Xu 217 1 0 08 Oct 2024
Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-trainingIEEE Transactions on Visualization and Computer Graphics (TVCG), 2024 Junxiao Shen Khadija Khaldi Enmin Zhou Hemant Bhaskar Surale Amy Karlson 158 0 0 08 Oct 2024
Addition is All You Need for Energy-efficient Language Models Hongyin Luo Wei Sun 106 11 0 01 Oct 2024
Compressing Recurrent Neural Networks for FPGA-accelerated Implementation in Fluorescence Lifetime Imaging Ismail Erbas Vikas Pandey Aporva Amarnath Naigang Wang Karthik Swaminathan Stefan T. Radev Xavier Intes AI4CE 135 1 0 01 Oct 2024
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record GenerationNeural Networks (NN), 2024 Hongyu Chen Weiming Zeng Chong Chen Luhui Cai Haiwei Yang ... Wei Zhang Yuchen Ren Hongjie Yan W. Siok Nizhuan Wang 276 0 0 30 Sep 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyMLInternet of Things (IoT), 2024 Matteo Carnelos Francesco Pasti Nicola Bellotto 230 5 0 28 Sep 2024
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse TrainingNeural Information Processing Systems (NeurIPS), 2024 Pihe Hu Shaolong Li Zhuoran Li L. Pan Longbo Huang 150 1 0 28 Sep 2024