Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data

10 March 2025

Abstract

We study the efficacy of fine-tuning Large Language Models (LLMs) for the specific task of report (government archives, news, intelligence reports) summarization. While this topic is being very actively researched - our specific application set-up faces two challenges: (i) ground-truth summaries maybe unavailable (e.g., for government archives), and (ii) availability of limited compute power - the sensitive nature of the application requires that computation is performed on-premise and for most of our experiments we use one or two A100 GPU cards. Under this set-up we conduct experiments to answer the following questions. First, given that fine-tuning the LLMs can be resource intensive, is it feasible to fine-tune them for improved report summarization capabilities on-premise? Second, what are the metrics we could leverage to assess the quality of these summaries? We conduct experiments on two different fine-tuning approaches in parallel and our findings reveal interesting trends regarding the utility of fine-tuning LLMs. Specifically, we find that in many cases, fine-tuning helps improve summary quality and in other cases it helps by reducing the number of invalid or garbage summaries.

View on arXiv

@article{rallapalli2025_2503.10676,
  title={ Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data },
  author={ Swati Rallapalli and Shannon Gallagher and Andrew O. Mellinger and Jasmine Ratchford and Anusha Sinha and Tyler Brooks and William R. Nichols and Nick Winski and Bryan Brown },
  journal={arXiv preprint arXiv:2503.10676},
  year={ 2025 }
}

Comments on this paper