Underreporting of errors in NLG output, and what to do about it

Underreporting of errors in NLG output, and what to do about it

2 August 2021

Emiel van Miltenburg

Dimitra Gkatzia

Stephanie Inglis

Papers citing "Underreporting of errors in NLG output, and what to do about it"

19 / 19 papers shown

Title
Integration of LLM Quality Assurance into an NLG System Ching-Yi Chen Johanna Heininger Adela Schneider Christian Eckard Andreas Madsack Robert Weißgraeber 39 0 0 28 Jan 2025
Probing Omissions and Distortions in Transformer-based RDF-to-Text Models J. Faille Albert Gatt Claire Gardent 24 0 0 25 Sep 2024
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices Patrícia Schmidtová Saad Mahamood Simone Balloccu Ondřej Dušek Albert Gatt Dimitra Gkatzia David M. Howcroft Ondřej Plátek Adarsa Sivaprasad 43 3 0 17 Aug 2024
Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation J. Faille Quentin Brabant Gwénolé Lecorvé L. Rojas-Barahona Claire Gardent 19 0 0 11 Apr 2024
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo Barkavi Sundararajan S. Sripada Ehud Reiter LMTD 19 1 0 05 Apr 2024
Guidance in Radiology Report Summarization: An Empirical Evaluation and Error Analysis Jan Trienes Paul Youssef Jorg Schlotterer Christin Seifert 11 0 0 24 Jul 2023
Evaluating NLG systems: A brief introduction Emiel van Miltenburg 18 0 0 29 Mar 2023
TabGenie: A Toolkit for Table-to-Text Generation Zdeněk Kasner E. Garanina Ondvrej Plátek Ondrej Dusek LMTD 24 8 0 27 Feb 2023
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems Sarah E. Finch James D. Finch Jinho D. Choi 17 12 0 18 Dec 2022
Implicit causality in GPT-2: a case study H. Huynh T. Lentz Emiel van Miltenburg LRM 22 3 0 08 Dec 2022
Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain Simone Balloccu Ehud Reiter 9 1 0 23 Jun 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications Kaitlyn Zhou Su Lin Blodgett Adam Trischler Hal Daumé Kaheer Suleman Alexandra Olteanu ELM 94 26 0 13 May 2022
Neural Pipeline for Zero-Shot Data-to-Text Generation Zdeněk Kasner Ondrej Dusek 16 33 0 30 Mar 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text Sebastian Gehrmann Elizabeth Clark Thibault Sellam ELM AI4CE 58 181 0 14 Feb 2022
Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction Manas Jain S. Saha P. Bhattacharyya Gladvin Chinnadurai M. Vatsa 14 2 0 07 Dec 2021
AI and the Everything in the Whole Wide World Benchmark Inioluwa Deborah Raji Emily M. Bender Amandalynne Paullada Emily L. Denton A. Hanna 15 289 0 26 Nov 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics Artidoro Pagnoni Vidhisha Balachandran Yulia Tsvetkov HILM 215 305 0 27 Apr 2021
A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images Pablo Messina Pablo Pino Denis Parra Alvaro Soto Cecilia Besa S. Uribe Marcelo andía C. Tejos Claudia Prieto Daniel Capurro MedIm 15 62 0 20 Oct 2020
With Little Power Comes Great Responsibility Dallas Card Peter Henderson Urvashi Khandelwal Robin Jia Kyle Mahowald Dan Jurafsky 225 115 0 13 Oct 2020