ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.04348
72
0

Prompt-based Depth Pruning of Large Language Models

17 February 2025
Juyun Wee
Minjae Park
Jaeho Lee
    VLM
ArXivPDFHTML
Abstract

Depth pruning aims to reduce the inference cost of a large language model without any hardware-specific complications, by simply removing several less important transformer blocks. However, our empirical findings suggest that the importance of a transformer block may be highly task-dependent -- a block that is crucial for a task can be removed without degrading the accuracy on another task. Based on this observation, we develop a dynamic depth pruning algorithm, coined PuDDing (Prompt-routed Dynamic Depth Pruning), which determines which blocks to omit from the model based on the input prompt. PuDDing operates by training a lightweight router to predict the best omission set among a set of options, where this option set has also been constructed in a data-driven manner. Empirical results on commonsense reasoning benchmarks demonstrate that PuDDing effectively accelerates the inference language models, and achieves better on-task performance than static depth pruning baselines.

View on arXiv
@article{wee2025_2502.04348,
  title={ Prompt-based Depth Pruning of Large Language Models },
  author={ Juyun Wee and Minjae Park and Jaeho Lee },
  journal={arXiv preprint arXiv:2502.04348},
  year={ 2025 }
}
Comments on this paper