Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

30 April 2021

Hannah Rashkin

Papers citing "Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark"

6 / 6 papers shown

Title
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection Atharva Kulkarni Yuan-kang Zhang Joel Ruben Antony Moniz Xiou Ge Bo-Hsiang Tseng Dhivya Piraviperumal S. Hong-ye Yu HILM 64 0 0 25 Apr 2025
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features Hannah Rashkin David Reitter Gaurav Singh Tomar Dipanjan Das 142 93 0 14 Jul 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics Artidoro Pagnoni Vidhisha Balachandran Yulia Tsvetkov HILM 210 265 0 27 Apr 2021
Focused Attention Improves Document-Grounded Generation Shrimai Prabhumoye Kazuma Hashimoto Yingbo Zhou A. Black Ruslan Salakhutdinov 151 38 0 26 Apr 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Sebastian Gehrmann Tosin P. Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo ... Nishant Subramani Wei-ping Xu Diyi Yang Akhila Yerukola Jiawei Zhou VLM 235 254 0 02 Feb 2021
GO FIGURE: A Meta Evaluation of Factuality in Summarization Saadia Gabriel Asli Celikyilmaz Rahul Jha Yejin Choi Jianfeng Gao HILM 211 80 0 24 Oct 2020