ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.06291
41
0

IteRABRe: Iterative Recovery-Aided Block Reduction

8 March 2025
Haryo Akbarianto Wibowo
Haiyue Song
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Raj Dabre
ArXivPDFHTML
Abstract

Large Language Models (LLMs) have grown increasingly expensive to deploy, driving the need for effective model compression techniques. While block pruning offers a straightforward approach to reducing model size, existing methods often struggle to maintain performance or require substantial computational resources for recovery. We present IteRABRe, a simple yet effective iterative pruning method that achieves superior compression results while requiring minimal computational resources. Using only 2.5M tokens for recovery, our method outperforms baseline approaches by ~3% on average when compressing the Llama3.1-8B and Qwen2.5-7B models. IteRABRe demonstrates particular strength in the preservation of linguistic capabilities, showing an improvement 5% over the baselines in language-related tasks. Our analysis reveals distinct pruning characteristics between these models, while also demonstrating preservation of multilingual capabilities.

View on arXiv
@article{wibowo2025_2503.06291,
  title={ IteRABRe: Iterative Recovery-Aided Block Reduction },
  author={ Haryo Akbarianto Wibowo and Haiyue Song and Hideki Tanaka and Masao Utiyama and Alham Fikri Aji and Raj Dabre },
  journal={arXiv preprint arXiv:2503.06291},
  year={ 2025 }
}
Comments on this paper