MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2025

24 February 2025

María Andrea Cruz Blandón

ArXiv (abs)PDF HTML Github

Main:1 Pages

6 Figures

13 Tables

Appendix:18 Pages

Abstract

Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine-grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural nuances. A native approach provides a better representation of the end user experience.

View on arXiv

Comments on this paper