v1v2v3 (latest)

$T^5Score$ : A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

24 July 2024

Itamar Trainin

Omri Abend

ArXiv (abs)PDF HTML Github

Main:8 Pages

10 Figures

Bibliography:3 Pages

13 Tables

Appendix:18 Pages

Abstract

Using LLMs for Multi-Document Topic Extraction has recently gained popularity due to their apparent high-quality outputs, expressiveness, and ease of use. However, most existing evaluation practices are not designed for LLM-generated topics and result in low inter-annotator agreement scores, hindering the reliable use of LLMs for the task. To address this, we introduce $T^5Score$ , an evaluation methodology that decomposes the quality of a topic set into quantifiable aspects, measurable through easy-to-perform annotation tasks. This framing enables a convenient, manual or automatic, evaluation procedure resulting in a strong inter-annotator agreement score. To substantiate our methodology and claims, we perform extensive experimentation on multiple datasets and report the results.

View on arXiv

Comments on this paper

T5ScoreT^5ScoreT5Score: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets

$T^5Score$ : A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets