ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19159
41
0

A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs

26 February 2025
Xuan Ding
Rui Sun
Yunjian Zhang
Xiu Yan
Yueqi Zhou
Kaihao Huang
Suzhong Fu
Chuanlong Xie
Yao Zhu
ArXivPDFHTML
Abstract

Compared to width-wise pruning, depth-wise pruning can significantly accelerate inference in resource-constrained scenarios. However, treating the entire Transformer layer as the minimum pruning unit may degrade model performance by indiscriminately discarding the entire information of the layer. This paper reveals the ``Patch-like'' feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space. Building on this observation, we propose a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold, thereby simplifying the model structure while maintaining its performance. Extensive experiments on LLMs with various architectures and different parameter scales show that our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning. In particular, in the experiment with 35% pruning on the Vicuna-7B model, our method achieved a 1.654% improvement in average performance on zero-shot tasks compared to the existing method. Moreover, we further reveal the potential of combining depth pruning with width pruning to enhance the pruning effect. Our codes are available atthis https URL.

View on arXiv
@article{ding2025_2502.19159,
  title={ A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs },
  author={ Xuan Ding and Rui Sun and Yunjian Zhang and Xiu Yan and Yueqi Zhou and Kaihao Huang and Suzhong Fu and Chuanlong Xie and Yao Zhu },
  journal={arXiv preprint arXiv:2502.19159},
  year={ 2025 }
}
Comments on this paper