v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015

Song Han

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,622 papers shown

Title
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models Wentao Hu Mingkuan Zhao Shuangyong Song Xiaoyan Zhu Xin Lai Jiayin Wang 71 1 0 25 Nov 2025
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers Rowan Bradbury Aniket Srinivasan Ashok Sai Ram Kasanagottu Gunmay Jhingran Shuai Meng 101 0 0 24 Nov 2025
Multimodal Real-Time Anomaly Detection and Industrial Applications Aman Verma Keshav Samdani Mohd. Samiuddin Shafi 90 0 0 24 Nov 2025
Exploiting the Experts: Unauthorized Compression in MoE-LLMs Pinaki Prasad Guha Neogi Ahmad Mohammadshirazi Dheeraj Kulshrestha R. Ramnath MoE 76 0 0 22 Nov 2025
Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration Jiaxun Fang Grace Li Zhang Shaoyi Huang MQ 223 0 0 21 Nov 2025
Equivariant-Aware Structured Pruning for Efficient Edge Deployment: A Comprehensive Framework with Adaptive Fine-Tuning Mohammed Alnemari 40 0 0 21 Nov 2025
Sex and age determination in European lobsters using AI-Enhanced bioacoustics Feliciano Domingos I. Ihianle Omprakash Kaiwartya Ahmad Lotfi Nicola Khan Nicholas Beaudreau Amaya Albalat Pedro Machado 122 0 0 20 Nov 2025
Weight-sparse transformers have interpretable circuits Leo Gao Achyuta Rajaram Jacob Coxon Soham V. Govande Bowen Baker Dan Mossing MILM 164 2 0 17 Nov 2025
MFI-ResNet: Efficient ResNet Architecture Optimization via MeanFlow Compression and Selective Incubation Nuolin Sun Linyuan Wang Haonan Wei Lei Li Bin Yan 101 0 0 16 Nov 2025
LILogic Net: Compact Logic Gate Networks with Learnable Connectivity for Efficient Hardware Deployment Katarzyna Fojcik Renaldas Zioma Jogundas Armaitis NAI GNN 124 0 0 15 Nov 2025
Steering Pretrained Drafters during Speculative Decoding Frédéric Berdoz Peer Rheinboldt Roger Wattenhofer LLMSV 316 0 0 13 Nov 2025
Toward Dignity-Aware AI: Next-Generation Elderly Monitoring from Fall Detection to ADL Xun Shao Aoba Otani Yuto Hirasuka Runji Cai Seng W. Loke 61 0 0 12 Nov 2025
CompressNAS : A Fast and Efficient Technique for Model Compression using Decomposition Sudhakar Sah Nikhil Chabbra Matthieu Durnerin 60 0 0 12 Nov 2025
A Generalized Spectral Framework to Expain Neural Scaling and Compression Dynamics Yizhou Zhang 72 2 0 11 Nov 2025
MI-to-Mid Distilled Compression (M2M-DC): An Hybrid-Information-Guided-Block Pruning with Progressive Inner Slicing Approach to Model Compression Lionel Levine Sajjad Ghiasvand Haniyeh Ehsani Oskouie Majid Sarrafzadeh MQ 295 0 0 10 Nov 2025
EcoSpa: Efficient Transformer Training with Coupled Sparsity Jinqi Xiao Cheng Luo Lingyi Huang Cheng Yang Yang Sui ... Xiao Zang Yibiao Ying Zhexiang Tang A. Anandkumar Bo Yuan 36 0 0 09 Nov 2025
SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal ControlIEEE Internet of Things Journal (IEEE IoT J.), 2025 Xiao-Cheng Liao Yi Mei Mengjie Zhang 69 0 0 08 Nov 2025
From Hubs to Deserts: Urban Cultural Accessibility Patterns with Explainable AI Protik Bose Pranto Minhazul Islam Ripon Kumar Saha Abimelec Mercado Rivera Namig Abbasov 52 0 0 08 Nov 2025
Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid Zahida Kausar Seemab Latif Raja Khurrum Shahzad Mehwish Fatima 48 0 0 06 Nov 2025
DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization Yuantian Shao Yuanteng Chen Peisong Wang Jianlin Yu Jing Lin Yiwu Yao Zhihui Wei Jian Cheng MQ 216 0 0 06 Nov 2025
MDM: Manhattan Distance Mapping of DNN Weights for Parasitic-Resistance-Resilient Memristive CrossbarsInternational Conference on Learning Representations (ICLR), 2025 Matheus Farias Wanghley Martins H. T. Kung 64 0 0 06 Nov 2025
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training Michael Menezes Barbara Su Xinze Feng Yehya Farhat Hamza Shili Anastasios Kyrillidis 136 1 0 06 Nov 2025
Efficiently Training A Flat Neural Network Before It has been Quantizated Peng Xia Junbiao Pang Tianyang Cai MQ 96 0 0 03 Nov 2025
Memory-Efficient Training with In-Place FFT Implementation Xinyu Ding Bangtian Liu Siyu Liao Zhongfeng Wang 185 0 0 03 Nov 2025
AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs David McCoy Yulun Wu Zachary Butzin-Dozier 64 0 0 02 Nov 2025
LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation Huanlin Gao Ping Chen Fuyuan Shi C. Tan Zhaoxiang Liu Fang Zhao Kai Wang Shiguo Lian DiffM VGen 159 0 0 30 Oct 2025
Efficient Cost-and-Quality Controllable Arbitrary-scale Super-resolution with Fourier Constraints Kazutoshi Akita Norimichi Ukita SupR 230 0 0 28 Oct 2025
Fast and accurate neural reflectance transformation imaging through knowledge distillation Tinsae G. Dulecha Leonardo Righetto Ruggero Pintus Enrico Gobbetti Andrea Giachetti 3DH 224 1 0 28 Oct 2025
PAHQ: Accelerating Automated Circuit Discovery through Mixed-Precision Inference Optimization Xinhai Wang Shu Yang Liangyu Wang L. Zhang Huanyi Xie Lijie Hu Di Wang 105 2 0 27 Oct 2025
Hankel Singular Value Regularization for Highly Compressible State Space Models Paul Schwerdtner Jules Berman Benjamin Peherstorfer 154 1 0 27 Oct 2025
Frustratingly Easy Task-aware Pruning for Large Language Models Yuanhe Tian Junjie Liu Xican Yang Haishan Ye Yan Song 93 0 0 26 Oct 2025
Dynamic Graph Neural Network for Data-Driven Physiologically Based Pharmacokinetic Modeling Su Liu Xin Hu Shurong Wen Jiaqi Liu Jiexi Xu Lanruo Wang 102 0 0 25 Oct 2025
Pruning and Quantization Impact on Graph Neural Networks Khatoon Khedri Reza Rawassizadeh Qifu Wen M. Hosseinzadeh GNN 150 0 0 24 Oct 2025
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples Shiva Sreeram Alaa Maalouf Pratyusha Sharma Daniela Rus 92 0 0 23 Oct 2025
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge Shu-Hao Zhang Wei Tang Chen Wu Peng Hu Nan Li L. Zhang Qi Zhang Shao-Qun Zhang MQ VLM 203 0 0 23 Oct 2025
HAMLOCK: HArdware-Model LOgically Combined attacK Sanskar Amgain Daniel Lobo Atri Chatterjee Swarup Bhunia Fnu Suya 89 0 0 22 Oct 2025
A Survey on Cache Methods in Diffusion Models: Toward Efficient Multi-Modal Generation Jiacheng Liu Xinyu Wang Yuqi Lin Zhikai Wang P. Wang ... Zexuan Yan Zhengyi Shi Chang Zou Yue Ma Linfeng Zhang 239 1 0 22 Oct 2025
MetaCluster: Enabling Deep Compression of Kolmogorov-Arnold Network Matthew Raffel Adwaith Renjith Lizhong Chen 81 0 0 21 Oct 2025
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression Baptiste Bauvin Loïc Baret Ola Ahmad 76 0 0 21 Oct 2025
From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models Ziyan Wang Enmao Diao Qi Le Pu Wang Minwoo Lee Shu-ping Yeh Evgeny Stupachenko Hao Feng Li Yang 100 1 0 20 Oct 2025
EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices Romina Aalishah Mozhgan Navardi T. Mohsenin Mamba 132 0 0 16 Oct 2025
Convergence, design and training of continuous-time dropout as a random batch method Antonio Álvarez-López Martín Hernández 52 0 0 15 Oct 2025
Generalizing WiFi Gesture Recognition via Large-Model-Aware Semantic Distillation and Alignment Feng-Qi Cui Yu-Tong Guo Tianyue Zheng Jinyang Huang 60 0 0 15 Oct 2025
Compressibility Measures Complexity: Minimum Description Length Meets Singular Learning Theory Einar Urdshals Edmund Lau Jesse Hoogland Stan van Wingerden Daniel Murfet 87 1 0 14 Oct 2025
A Comprehensive Forecasting-Based Framework for Time Series Anomaly Detection: Benchmarking on the Numenta Anomaly Benchmark (NAB) Mohammad Karami Mostafa Jalali Fatemeh Ghassemi AI4TS 79 0 0 13 Oct 2025
Long Exposure: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy SparsityInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024 Tuowei Wang Kun Li Zixu Hao Donglin Bai Ju Ren Yaoxue Zhang Ting Cao M. Yang 116 4 0 12 Oct 2025
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models Lancheng Zou Shuo Yin Zehua Pei Tsung-Yi Ho Farzan Farnia Bei Yu 60 0 0 11 Oct 2025
Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation Md. Nayeem Md Shamse Tabrej Kabbojit Jit Deb Shaonti Goswami Md. Azizul Hakim AI4TS VLM 64 2 0 11 Oct 2025
SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions Ziyi Wang Nan Jiang Guang Lin Qifan Song MQ 165 0 0 10 Oct 2025
Automated Evolutionary Optimization for Resource-Efficient Neural Network Training Ilia Revin Leon Strelkov Vadim A. Potemkin Ivan A Kireev Andrey Savchenko 88 0 0 10 Oct 2025