A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

8 November 2020

Papers citing "A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems"

30 / 30 papers shown

Title
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models Anindya Bijoy Das Shibbir Ahmed Shahnewaz Karim Sakib HILM LM&MA 57 0 0 27 Apr 2025
Large Language Models as Span Annotators Zdeněk Kasner Vilém Zouhar Patrícia Schmidtová Ivan Kartáč Kristýna Onderková Ondřej Plátek Dimitra Gkatzia Saad Mahamood Ondrej Dusek Simone Balloccu ALM 35 0 0 11 Apr 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs Ivan Kartáč Mateusz Lango Ondrej Dusek ELM 49 1 0 14 Mar 2025
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation Song Duong Florian Le Bronnec Alexandre Allauzen Vincent Guigue Alberto Lumbreras Laure Soulier Patrick Gallinari HILM 43 0 0 20 Feb 2025
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices Patrícia Schmidtová Saad Mahamood Simone Balloccu Ondřej Dušek Albert Gatt Dimitra Gkatzia David M. Howcroft Ondřej Plátek Adarsa Sivaprasad 43 3 0 17 Aug 2024
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese Yunqi Xu Tianchi Cai Jiyan Jiang Xierui Song 33 2 0 01 Jul 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Yuxuan Wang Yueqian Wang Dongyan Zhao Cihang Xie Zilong Zheng MLLM VLM 42 25 0 24 Jun 2024
Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation J. Faille Quentin Brabant Gwénolé Lecorvé L. Rojas-Barahona Claire Gardent 21 0 0 11 Apr 2024
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo Barkavi Sundararajan S. Sripada Ehud Reiter LMTD 27 1 0 05 Apr 2024
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models S. Hegselmann Zejiang Shen Florian Gierse Monica Agrawal David Sontag Xiaoyi Jiang HILM VLM 24 6 0 23 Feb 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation Zdeněk Kasner Ondrej Dusek 33 8 0 18 Jan 2024
The Pitfalls of Defining Hallucination Kees van Deemter HILM 22 6 0 15 Jan 2024
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment Boyang Xue Weichao Wang Hongru Wang Fei Mi Rui Wang Yasheng Wang Lifeng Shang Xin Jiang Qun Liu Kam-Fai Wong KELM HILM 211 15 0 12 Oct 2023
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection Shiping Yang Renliang Sun Xiao-Yi Wan HILM 30 41 0 10 Oct 2023
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis LI DU Yequan Wang Xingrun Xing Yiqun Ya Xiang Li Xin Jiang Xuezhi Fang HILM 28 13 0 11 Sep 2023
Tackling Hallucinations in Neural Chart Summarization Saad Obaid ul Islam Iza vSkrjanec Ondrej Dusek Vera Demberg HILM 29 7 0 01 Aug 2023
Context-Aware Document Simplification Liam Cripwell Joël Legrand Claire Gardent 24 5 0 10 May 2023
HistAlign: Improving Context Dependency in Language Generation by Aligning with History David Wan Shiyue Zhang Mohit Bansal AI4TS 32 5 0 08 May 2023
Evaluation of Question Answering Systems: Complexity of judging a natural language Amer Farea Zhen Yang Kien Duong Nadeesha Perera F. Emmert-Streib ELM 29 3 0 10 Sep 2022
Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain Simone Balloccu Ehud Reiter 9 1 0 23 Jun 2022
Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies Angela Fan Claire Gardent 22 4 0 12 Apr 2022
Data-to-text Generation with Variational Sequential Planning Ratish Puduppully Yao Fu Mirella Lapata 48 21 0 28 Feb 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text Sebastian Gehrmann Elizabeth Clark Thibault Sellam ELM AI4CE 58 183 0 14 Feb 2022
Survey of Hallucination in Natural Language Generation Ziwei Ji Nayeon Lee Rita Frieske Tiezheng Yu D. Su ... Delong Chen Wenliang Dai Ho Shu Chan Andrea Madotto Pascale Fung HILM LRM 38 2,232 0 08 Feb 2022
Rome was built in 1776: A Case Study on Factual Correctness in Knowledge-Grounded Response Generation Sashank Santhanam Behnam Hedayatnia Spandana Gella Aishwarya Padmakumar Seokhwan Kim Yang Liu Dilek Z. Hakkani-Tür 22 35 0 11 Oct 2021
Generation Challenges: Results of the Accuracy Evaluation Shared Task Craig Thomson Ehud Reiter 14 18 0 12 Aug 2021
Underreporting of errors in NLG output, and what to do about it Emiel van Miltenburg Miruna Clinciu Ondrej Dusek Dimitra Gkatzia Stephanie Inglis ... Saad Mahamood Emma Manning S. Schoch Craig Thomson Luou Wen 20 38 0 02 Aug 2021
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation Markus Freitag George F. Foster David Grangier Viresh Ratnakar Qijun Tan Wolfgang Macherey 11 373 0 29 Apr 2021
Towards objectively evaluating the quality of generated medical summaries Francesco Moramarco Damir Juric Aleksandar Savkov Ehud Reiter 17 9 0 09 Apr 2021
Shared Task on Evaluating Accuracy in Natural Language Generation Ehud Reiter Craig Thomson 16 9 0 22 Jun 2020