ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.14679
  4. Cited By
Compact Language Models via Pruning and Knowledge Distillation

Compact Language Models via Pruning and Knowledge Distillation

19 July 2024
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
    SyDaMQ
ArXiv (abs)PDFHTMLHuggingFace (40 upvotes)

Papers citing "Compact Language Models via Pruning and Knowledge Distillation"

50 / 63 papers shown
Title
E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
E3^33-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models
Tao Yuan
Haoli Bai
Yinfei Pan
Xuyang Cao
Tianyu Zhang
Lu Hou
Ting Hu
Xianzhi Yu
VLM
131
0
0
21 Nov 2025
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Ali Taghibakhshi
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Ruisi Cai
Marcin Chochowski
...
Jan Kautz
Bryan Catanzaro
Ashwath Aithal
Nima Tajbakhsh
Pavlo Molchanov
48
0
0
20 Nov 2025
Pluggable Pruning with Contiguous Layer Distillation for Diffusion Transformers
Jian Ma
Qirong Peng
Xujie Zhu
Peixing Xie
Chen Chen
H. Lu
51
0
0
20 Nov 2025
Stratified Knowledge-Density Super-Network for Scalable Vision Transformers
Stratified Knowledge-Density Super-Network for Scalable Vision Transformers
Longhua Li
Lei Qi
Xin Geng
ViT
52
0
0
12 Nov 2025
Hankel Singular Value Regularization for Highly Compressible State Space Models
Hankel Singular Value Regularization for Highly Compressible State Space Models
Paul Schwerdtner
Jules Berman
Benjamin Peherstorfer
138
1
0
27 Oct 2025
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
Keyu Wang
Tian Lyu
Guinan Su
Jonas Geiping
L. Yin
Marco Canini
Shiwei Liu
LRM
89
0
0
25 Oct 2025
Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search
Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search
Kayhan Behdin
Qingquan Song
Sriram Vasudevan
Jian Sheng
Xiaojing Ma
...
V. Sodha
Qi Guo
Caleb Johnson
Zhipeng Wang
Fedor Borisyuk
115
1
0
25 Oct 2025
Normalization in Attention Dynamics
Normalization in Attention Dynamics
Nikita Karagodin
Shu Ge
Yury Polyanskiy
Philippe Rigollet
144
1
0
24 Oct 2025
Learning Task-Agnostic Representations through Multi-Teacher Distillation
Learning Task-Agnostic Representations through Multi-Teacher Distillation
Philippe Formont
Maxime Darrin
Banafsheh Karimian
Jackie Chi Kit Cheung
Eric Granger
Ismail Ben Ayed
Mohammadhadi Shateri
Pablo Piantanida
112
0
0
21 Oct 2025
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
Zebin Yang
Sunjian Zheng
Tong Xie
Tianshi Xu
Bo Yu
Fan Wang
Jie Tang
Shaoshan Liu
Meng Li
76
0
0
21 Oct 2025
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
Ruijie Miao
Yilun Yao
Zihan Wang
Z. Wang
Bairen Yi
LingJun Liu
Yikai Zhao
Tong Yang
MoMe
124
0
0
16 Oct 2025
Rethinking Knowledge Distillation: A Data Dependent Regulariser With a Negative Asymmetric Payoff
Rethinking Knowledge Distillation: A Data Dependent Regulariser With a Negative Asymmetric Payoff
Israel Mason-Williams
Gabryel Mason-Williams
Helen Yannakoudakis
48
0
0
14 Oct 2025
A PCA-based Data Prediction Method
A PCA-based Data Prediction MethodBaltic Journal of Modern Computing (BJMC), 2025
Peteris Daugulis
Vija Vagale
Emiliano Mancini
Filippo Castiglione
72
3
0
10 Oct 2025
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning
Minsik Choi
Hyegang Son
Changhoon Kim
Young Geun Kim
AAML
64
0
0
10 Oct 2025
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Arjun Krishnakumar
R. Sukthanker
Hannan Javed Mahadik
Gabriela Kadlecová
Vladyslav Moroshan
Timur Carstensen
Frank Hutter
Aaron Klein
65
0
0
08 Oct 2025
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Sara Kangaslahti
Nihal V. Nayak
Jonathan Geuter
Marco Fumero
Francesco Locatello
David Alvarez-Melis
124
0
0
06 Oct 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Adnan Ben Mansour
Ayoub Karine
D. Naccache
72
0
0
30 Sep 2025
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
T. Nguyen
Jaehun Kim
Ji-Hoon Kim
Shukjae Choi
Youshin Lim
Joon Son Chung
80
0
0
25 Sep 2025
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
Eugene Kwek
Wenpeng Yin
VLM
160
0
0
08 Sep 2025
PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference
PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference
Hao Zhang
Mengsi Lyu
Zhuo Chen
Xingrun Xing
Yulong Ao
Yonghua Lin
307
1
0
29 Aug 2025
Adaptive Knowledge Distillation for Device-Directed Speech Detection
Adaptive Knowledge Distillation for Device-Directed Speech Detection
Hyung Gun Chi
Florian Pesce
Wonil Chang
Oggi Rudovic
Arturo Argueta
Stefan Braun
Vineet Garg
Ahmed Hussen Abdelaziz
74
0
0
04 Aug 2025
Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment
Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment
Xiandong Meng
Yan Wu
Yexin Tian
Xin Hu
Tianze Kang
Junliang Du
104
5
0
21 Jul 2025
Flexible Feature Distillation for Large Language Models
Flexible Feature Distillation for Large Language Models
Khouloud Saadi
Di Wang
157
0
0
14 Jul 2025
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
Patrik Okanovic
Sameer Deshmukh
Grzegorz Kwa'sniewski
Yi Zhu
Haruto Fujii
...
Maciej Besta
Kentaro Katayama
Takumi Honda
Yusuke Nagasaka
Torsten Hoefler
128
0
0
03 Jul 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
215
2
0
18 Jun 2025
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
Attribution-guided Pruning for Compression, Circuit Discovery, and Targeted Correction in LLMs
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Reduan Achtibat
Patrick Kahardipraja
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
194
2
0
16 Jun 2025
Pruning Everything, Everywhere, All at Once
Gustavo Henrique do Nascimento
Ian Pons
A. H. R. Costa
Artur Jordao
169
1
0
04 Jun 2025
Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation
Minifinetuning: Low-Data Generation Domain Adaptation through Corrective Self-Distillation
Peter Belcak
Greg Heinrich
Jan Kautz
Pavlo Molchanov
ALM
108
1
0
30 May 2025
RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding
RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding
Yuichiro Hoshino
Hideyuki Tachibana
Muneyoshi Inahara
Hiroto Takegawa
205
0
0
28 May 2025
SlimLLM: Accurate Structured Pruning for Large Language Models
SlimLLM: Accurate Structured Pruning for Large Language Models
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
130
1
0
28 May 2025
DLP: Dynamic Layerwise Pruning in Large Language Models
DLP: Dynamic Layerwise Pruning in Large Language Models
Yuli Chen
B. Cheng
Jiale Han
Yingying Zhang
Yingting Li
Shuhao Zhang
202
1
0
27 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
295
1
0
27 May 2025
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs
Hanting Chen
Jiarui Qin
Jialong Guo
Tao Yuan
Yichun Yin
...
Can Chen
Xinghao Chen
Fisher Yu
Ruiming Tang
Yunhe Wang
220
0
0
26 May 2025
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Mingyu Yang
Mehdi Rezagholizadeh
Guihong Li
Vikram Appia
Emad Barsoum
183
2
0
22 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
221
5
0
14 May 2025
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Dmitriy Shopkhoev
Ammar Ali
Magauiya Zhussip
Valentin Malykh
Stamatios Lefkimmiatis
N. Komodakis
Sergey Zagoruyko
VLM
1.0K
1
0
05 May 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
497
2
0
23 Apr 2025
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Lorenz Linhardt
Tom Neuhäuser
Lenka Tětková
Oliver Eberle
ALMAI4TS
158
1
0
10 Apr 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQLRM
325
25
0
07 Apr 2025
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin
Lei Ji
Xiao Liu
Yeyun Gong
274
1
0
24 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Ning Yang
Jun Wang
788
2
0
15 Mar 2025
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
Yixiao Li
Xianzhi Du
Ajay Jaiswal
Tao Lei
T. Zhao
Chong-Jun Wang
Jianyu Wang
159
1
0
07 Mar 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
269
11
0
26 Feb 2025
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Bowei He
Lihao Yin
Hui-Ling Zhen
Xiaokun Zhang
Mingxuan Yuan
Chen Ma
334
1
0
18 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
328
20
0
09 Feb 2025
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Makoto Shing
Yuichi Inoue
Han Bao
Sho Yokoi
Takuya Akiba
VLM
463
11
0
28 Jan 2025
CURing Large Models: Compression via CUR Decomposition
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
285
2
0
08 Jan 2025
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
325
4
0
02 Dec 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
659
8
0
28 Nov 2024
Reassessing Layer Pruning in LLMs: New Insights and Methods
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
303
14
0
23 Nov 2024
12
Next