MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
- VLM
Main:1 Pages
6 Figures
13 Tables
Appendix:18 Pages
Abstract
Automatic evaluation of retrieval augmented generation (RAG) systems relies on fine-grained dimensions like faithfulness and relevance, as judged by expert human annotators. Meta-evaluation benchmarks support the development of automatic evaluators that correlate well with human judgement. However, existing benchmarks predominantly focus on English or use translated data, which fails to capture cultural nuances. A native approach provides a better representation of the end user experience.
View on arXivComments on this paper
