Argument Summarization and its Evaluation in the Era of Large Language Models

2 March 2025

Abstract

Large Language Models (LLMs) have revolutionized various Natural Language Generation (NLG) tasks, including Argument Summarization (ArgSum), a key subfield of Argument Mining (AM). This paper investigates the integration of state-of-the-art LLMs into ArgSum, including for its evaluation. In particular, we propose a novel prompt-based evaluation scheme, and validate it through a novel human benchmark dataset. Our work makes three main contributions: (i) the integration of LLMs into existing ArgSum frameworks, (ii) the development of a new LLM-based ArgSum system, benchmarked against prior methods, and (iii) the introduction of an advanced LLM-based evaluation scheme. We demonstrate that the use of LLMs substantially improves both the generation and evaluation of argument summaries, achieving state-of-the-art results and advancing the field of ArgSum.

View on arXiv

@article{altemeyer2025_2503.00847,
  title={ Argument Summarization and its Evaluation in the Era of Large Language Models },
  author={ Moritz Altemeyer and Steffen Eger and Johannes Daxenberger and Tim Altendorf and Philipp Cimiano and Benjamin Schiller },
  journal={arXiv preprint arXiv:2503.00847},
  year={ 2025 }
}

Comments on this paper