v1v2v3v4v5 (latest)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

1 October 2015

Song Han

Papers citing "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding"

50 / 3,628 papers shown

Title
Can pruning make Large Language Models more efficient? Sia Gholami Marwan Omar 269 19 0 06 Oct 2023
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionNeural Information Processing Systems (NeurIPS), 2023 Filip Szatkowski Eric Elmoznino Younesse Kaddar Simone Scardapane MoE 238 12 0 06 Oct 2023
Quantized Transformer Language Model Implementations on Edge DevicesInternational Conference on Machine Learning and Applications (ICMLA), 2023 Mohammad Wali Ur Rahman Murad Mehrab Abrar Hunter Gibbons Copening Salim Hariri Sicong Shao Pratik Satam Soheil Salehi MQ 153 24 0 06 Oct 2023
Denoising Diffusion Step-aware ModelsInternational Conference on Learning Representations (ICLR), 2023 Shuai Yang Yukang Chen Luozhou Wang Shu Liu Ying-Cong Chen DiffM 353 22 0 05 Oct 2023
ResidualTransformer: Residual Low-Rank Learning with Weight-Sharing for Transformer LayersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Yiming Wang Jinyu Li 170 11 0 03 Oct 2023
Feather: An Elegant Solution to Effective DNN SparsificationBritish Machine Vision Conference (BMVC), 2023 Athanasios Glentis Georgoulakis George Retsinas Petros Maragos 196 1 0 03 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor CoresInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023 Roberto L. Castro Andrei Ivanov Diego Andrade Tal Ben-Nun B. Fraguela Torsten Hoefler 153 30 0 03 Oct 2023
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model TrainingInternational Conference on Learning Representations (ICLR), 2023 Chenyi Zi Yimeng Zhang Jinghan Jia James Diffenderfer Jiancheng Liu Konstantinos Parasyris Yihua Zhang Zheng Zhang B. Kailkhura Sijia Liu 590 72 0 03 Oct 2023
The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers under Fully Homomorphic Encryption on the Torus Rickard Brannvall Andrei Stoian 132 0 0 03 Oct 2023
Compressing LLMs: The Truth is Rarely Pure and Never SimpleInternational Conference on Learning Representations (ICLR), 2023 Ajay Jaiswal Zhe Gan Xianzhi Du Bowen Zhang Zinan Lin Yinfei Yang MQ 254 60 0 02 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyInternational Conference on Learning Representations (ICLR), 2023 Pingzhi Li Zhenyu Zhang Prateek Yadav Yi-Lin Sung Yu Cheng Mohit Bansal Tianlong Chen MoMe 230 73 0 02 Oct 2023
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure DelegationPervasive and Mobile Computing (PMC), 2023 Marco Arazzi S. Nicolazzo Antonino Nocera 167 13 0 02 Oct 2023
ECNR: Efficient Compressive Neural Representation of Time-Varying Volumetric DatasetsIEEE Pacific Visualization Symposium (PacificVis), 2023 Kaiyuan Tang Chaoli Wang 233 19 0 02 Oct 2023
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications Duc Hoang Minsik Cho Thomas Merth Mohammad Rastegari Zhangyang Wang KELM CLL 235 5 0 02 Oct 2023
SINF: Semantic Neural Network Inference with Semantic Subgraphs Sazzad Sayyed Jonathan D. Ashdown 215 0 0 02 Oct 2023
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsInternational Conference on Compiler Construction (CC), 2023 Cyrus Zhou Zack Hassman Ruize Xu Dhirpal Shah Vaughn Richard Yanjing Li 433 5 0 01 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors Chengming Zhang Baixi Sun Xiaodong Yu Zhen Xie Weijian Zheng K. Iskra Pete Beckman Dingwen Tao 125 7 0 29 Sep 2023
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMsInternational Conference on Machine Learning (ICML), 2023 Lu Yin Ajay Jaiswal Shiwei Liu Souvik Kundu Zinan Lin 341 7 0 29 Sep 2023
AdaEvo: Edge-Assisted Continuous and Timely DNN Model Evolution for Mobile DevicesIEEE Transactions on Mobile Computing (IEEE TMC), 2023 Lehao Wang Zhiwen Yu Haoyi Yu Sicong Liu Yaxiong Xie Bin Guo Yunxin Liu 192 6 0 27 Sep 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A surveyIEEE Communications Surveys and Tutorials (COMST), 2023 Sicong Liu Bin Guo Cheng Fang Ziqi Wang Shiyan Luo Zimu Zhou Zhiwen Yu AI4CE 251 35 0 27 Sep 2023
Efficient Post-training Quantization with FP8 FormatsConference on Machine Learning and Systems (MLSys), 2023 Haihao Shen Naveen Mellempudi Xin He Q. Gao Chang‐Bao Wang Mengni Wang MQ 276 35 0 26 Sep 2023
Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantizationNeural Information Processing Systems (NeurIPS), 2023 Christopher Subia-Waud S. Dasmahapatra UQCV MQ 227 1 0 24 Sep 2023
ThinResNet: A New Baseline for Structured Convolutional Networks Pruning Hugo Tessier Ghouti Boukli Hacene Vincent Gripon 158 1 0 22 Sep 2023
RAI4IoE: Responsible AI for Enabling the Internet of EnergyInternational Conference on Trust, Privacy and Security in Intelligent Systems and Applications (ICPSISA), 2023 Minhui Xue Surya Nepal Ling Liu Subbu Sethuvenkatraman Xingliang Yuan Carsten Rudolph Ruoxi Sun Greg Eisenhauer 243 6 0 20 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured SparsityProceedings of the VLDB Endowment (PVLDB), 2023 Haojun Xia Zhen Zheng Yuchao Li Donglin Zhuang Zhongzhu Zhou Xiafei Qiu Yong Li Wei Lin Shuaiwen Leon Song 154 21 0 19 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling Ziming Wang Shumin Han Xiaodi Wang Jing Hao Xianbin Cao Baochang Zhang VLM 216 1 0 18 Sep 2023
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices George August Wright Umberto Cappellazzo Salah Zaiem Desh Raj Lucas Ondel Yang Daniele Falavigna Mohamed Nabih Ali Alessio Brutti 176 4 0 18 Sep 2023
Enhancing Quantised End-to-End ASR Models via PersonalisationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Qiuming Zhao Guangzhi Sun Chao Zhang Mingxing Xu Thomas Fang Zheng MQ 132 3 0 17 Sep 2023
Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023 Elias Frantar C. Riquelme N. Houlsby Dan Alistarh Utku Evci 231 46 0 15 Sep 2023
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity Matteo Grimaldi Darshan C. Ganji Ivan Lazarevich Sudhakar Sah 168 12 0 12 Sep 2023
Real-Time Semantic Segmentation: A Brief Survey & Comparative Study in Remote SensingIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023 Clifford Broni-Bediako Junshi Xia Xiangwei Zhu 245 15 0 12 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference Kiwan Maeng G. E. Suh 153 4 0 09 Sep 2023
Sparse Federated Training of Object Detection in the Internet of Vehicles Luping Rao Chuan Ma Ming Ding Yuwen Qian Lu Zhou Yanfeng Guo 84 3 0 07 Sep 2023
Bandwidth-efficient Inference for Neural Image CompressionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Shanzhi Yin Tongda Xu Yongsheng Liang Yuanyuan Wang Yanghao Li Yan Wang Jingjing Liu 143 1 0 06 Sep 2023
Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep LearningConference on Computer and Communications Security (CCS), 2023 Hanshen Xiao Jun Wan Srini Devadas 238 14 0 06 Sep 2023
In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing PlatformsInternational Conference on Internet-of-Things Design and Implementation (IoTDI), 2023 Philipp Schilk Niccolò Polvani Andrea Ronco Milos Cernak Michele Magno 185 13 0 05 Sep 2023
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks Wei Huang Haotong Qin Yangdong Liu Jingzhuo Liang Yifu Ding Ying Li Xianglong Liu MQ 363 2 0 05 Sep 2023
Efficient Defense Against Model Stealing Attacks on Convolutional Neural NetworksInternational Conference on Machine Learning and Applications (ICMLA), 2023 Kacem Khaled Mouna Dhaouadi F. Magalhães Gabriela Nicolescu AAML 102 2 0 04 Sep 2023
On the fly Deep Neural Network Optimization Control for Low-Power Computer VisionIEEE International Performance, Computing, and Communications Conference (IPCCC), 2023 Ishmeet Kaur Adwaita Janardhan Jadhav 100 0 0 04 Sep 2023
ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation Nastaran Darabi Maeesha Binte Hashem Hongyi Pan Ahmet Cetin Wilfred Gomes A. R. Trivedi 110 6 0 04 Sep 2023
Saturn: An Optimized Data System for Large Model Deep Learning WorkloadsProceedings of the VLDB Endowment (PVLDB), 2023 Kabir Nagrecha Arun Kumar 304 8 0 03 Sep 2023
eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language ModelsIEEE computer architecture letters (CAL), 2023 Minsik Cho Keivan Alizadeh Vahid Qichen Fu Saurabh N. Adya C. C. D. Mundo Mohammad Rastegari Devang Naik Peter Zatloukal MQ 204 9 0 02 Sep 2023
Proof of Deep Learning: Approaches, Challenges, and Future Directions Mahmoud Salhab Khaleel W. Mershad 139 3 0 31 Aug 2023
Latency-aware Unified Dynamic Networks for Efficient Image RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 Yizeng Han Zeyu Liu Zhihang Yuan Yifan Pu Chaofei Wang Shiji Song Gao Huang 425 33 0 30 Aug 2023
Generative Model for Models: Rapid DNN Customization for Diverse Tasks and Resource Constraints Wenxing Xu Yuanchun Li Jiacheng Liu Yiyou Sun Zhengyang Cao Shouqing Yang Hao Wen Yunxin Liu 222 2 0 29 Aug 2023
Uncovering the Hidden Cost of Model Compression Diganta Misra Muawiz Chaudhary Agam Goyal Bharat Runwal Pin-Yu Chen VLM 261 3 0 29 Aug 2023
Low-bit Quantization for Deep Graph Neural Networks with Smoothness-aware Message PropagationInternational Conference on Information and Knowledge Management (CIKM), 2023 Shuang Wang B. Eravcı Rustam Guliyev Hakan Ferhatosmanoglu GNN MQ 155 10 0 29 Aug 2023
Maestro: Uncovering Low-Rank Structures via Trainable DecompositionInternational Conference on Machine Learning (ICML), 2023 Samuel Horváth Stefanos Laskaridis Shashank Rajput Hongyi Wang BDL 318 9 0 28 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey Yulin Wang Yizeng Han Chaofei Wang Shiji Song Qi Tian Gao Huang VLM 282 32 0 27 Aug 2023
Homological Convolutional Neural Networks Antonio Briola Yuanrong Wang Silvia Bartolucci T. Aste LMTD 219 7 0 26 Aug 2023