Compressing Language Models for Specialized Domains

25 February 2025

Abstract

Compression techniques such as pruning and quantization offer a solution for more efficient deployment of language models (LMs), albeit with small performance drops in benchmark performance. However, general-purpose LM compression methods can negatively affect performance in specialized domains (e.g. biomedical or legal). Recent work has sought to address this, yet requires computationally expensive full-parameter fine-tuning. To this end, we propose cross-calibration, a novel training-free approach for improving the domain performance of compressed LMs. Our approach effectively leverages Hessian-based sensitivity to identify weights that are influential for both in-domain and general performance. Through extensive experimentation, we demonstrate that cross-calibration substantially outperforms existing approaches on domain-specific tasks, without compromising general performance. Notably, these gains come without additional computational overhead, displaying remarkable potential towards extracting domain-specialized compressed models from general-purpose LMs.

View on arXiv

@article{williams2025_2502.18424,
  title={ Compressing Language Models for Specialized Domains },
  author={ Miles Williams and George Chrysostomou and Vitor Jeronymo and Nikolaos Aletras },
  journal={arXiv preprint arXiv:2502.18424},
  year={ 2025 }
}

Comments on this paper