ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.03992
  4. Cited By
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text
  Systems

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

8 November 2020
Craig Thomson
Ehud Reiter
ArXivPDFHTML

Papers citing "A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems"

30 / 30 papers shown
Title
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models
Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models
Anindya Bijoy Das
Shibbir Ahmed
Shahnewaz Karim Sakib
HILM
LM&MA
57
0
0
27 Apr 2025
Large Language Models as Span Annotators
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
35
0
0
11 Apr 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
49
1
0
14 Mar 2025
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
Song Duong
Florian Le Bronnec
Alexandre Allauzen
Vincent Guigue
Alberto Lumbreras
Laure Soulier
Patrick Gallinari
HILM
43
0
0
20 Feb 2025
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
43
3
0
17 Aug 2024
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented
  Generation in Chinese
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese
Yunqi Xu
Tianchi Cai
Jiyan Jiang
Xierui Song
33
2
0
01 Jul 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
  Large Video-Language Models
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Yuxuan Wang
Yueqian Wang
Dongyan Zhao
Cihang Xie
Zilong Zheng
MLLM
VLM
42
25
0
24 Jun 2024
Question Generation in Knowledge-Driven Dialog: Explainability and
  Evaluation
Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation
J. Faille
Quentin Brabant
Gwénolé Lecorvé
L. Rojas-Barahona
Claire Gardent
21
0
0
11 Apr 2024
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing
  Input Problems in ToTTo
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo
Barkavi Sundararajan
S. Sripada
Ehud Reiter
LMTD
27
1
0
05 Apr 2024
A Data-Centric Approach To Generate Faithful and High Quality Patient
  Summaries with Large Language Models
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models
S. Hegselmann
Zejiang Shen
Florian Gierse
Monica Agrawal
David Sontag
Xiaoyi Jiang
HILM
VLM
24
6
0
23 Feb 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on
  Data-to-Text Generation
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Zdeněk Kasner
Ondrej Dusek
33
8
0
18 Jan 2024
The Pitfalls of Defining Hallucination
The Pitfalls of Defining Hallucination
Kees van Deemter
HILM
22
6
0
15 Jan 2024
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems
  via Knowledge Enhancement and Alignment
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Boyang Xue
Weichao Wang
Hongru Wang
Fei Mi
Rui Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
KELM
HILM
211
15
0
12 Oct 2023
A New Benchmark and Reverse Validation Method for Passage-level
  Hallucination Detection
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection
Shiping Yang
Renliang Sun
Xiao-Yi Wan
HILM
30
41
0
10 Oct 2023
Quantifying and Attributing the Hallucination of Large Language Models
  via Association Analysis
Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis
LI DU
Yequan Wang
Xingrun Xing
Yiqun Ya
Xiang Li
Xin Jiang
Xuezhi Fang
HILM
28
13
0
11 Sep 2023
Tackling Hallucinations in Neural Chart Summarization
Tackling Hallucinations in Neural Chart Summarization
Saad Obaid ul Islam
Iza vSkrjanec
Ondrej Dusek
Vera Demberg
HILM
29
7
0
01 Aug 2023
Context-Aware Document Simplification
Context-Aware Document Simplification
Liam Cripwell
Joël Legrand
Claire Gardent
24
5
0
10 May 2023
HistAlign: Improving Context Dependency in Language Generation by
  Aligning with History
HistAlign: Improving Context Dependency in Language Generation by Aligning with History
David Wan
Shiyue Zhang
Mohit Bansal
AI4TS
32
5
0
08 May 2023
Evaluation of Question Answering Systems: Complexity of judging a
  natural language
Evaluation of Question Answering Systems: Complexity of judging a natural language
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
29
3
0
10 Sep 2022
Comparing informativeness of an NLG chatbot vs graphical app in
  diet-information domain
Comparing informativeness of an NLG chatbot vs graphical app in diet-information domain
Simone Balloccu
Ehud Reiter
9
1
0
23 Jun 2022
Generating Full Length Wikipedia Biographies: The Impact of Gender Bias
  on the Retrieval-Based Generation of Women Biographies
Generating Full Length Wikipedia Biographies: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies
Angela Fan
Claire Gardent
22
4
0
12 Apr 2022
Data-to-text Generation with Variational Sequential Planning
Data-to-text Generation with Variational Sequential Planning
Ratish Puduppully
Yao Fu
Mirella Lapata
48
21
0
28 Feb 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
58
183
0
14 Feb 2022
Survey of Hallucination in Natural Language Generation
Survey of Hallucination in Natural Language Generation
Ziwei Ji
Nayeon Lee
Rita Frieske
Tiezheng Yu
D. Su
...
Delong Chen
Wenliang Dai
Ho Shu Chan
Andrea Madotto
Pascale Fung
HILM
LRM
38
2,232
0
08 Feb 2022
Rome was built in 1776: A Case Study on Factual Correctness in
  Knowledge-Grounded Response Generation
Rome was built in 1776: A Case Study on Factual Correctness in Knowledge-Grounded Response Generation
Sashank Santhanam
Behnam Hedayatnia
Spandana Gella
Aishwarya Padmakumar
Seokhwan Kim
Yang Liu
Dilek Z. Hakkani-Tür
22
35
0
11 Oct 2021
Generation Challenges: Results of the Accuracy Evaluation Shared Task
Generation Challenges: Results of the Accuracy Evaluation Shared Task
Craig Thomson
Ehud Reiter
14
18
0
12 Aug 2021
Underreporting of errors in NLG output, and what to do about it
Underreporting of errors in NLG output, and what to do about it
Emiel van Miltenburg
Miruna Clinciu
Ondrej Dusek
Dimitra Gkatzia
Stephanie Inglis
...
Saad Mahamood
Emma Manning
S. Schoch
Craig Thomson
Luou Wen
20
38
0
02 Aug 2021
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation
  for Machine Translation
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
11
373
0
29 Apr 2021
Towards objectively evaluating the quality of generated medical
  summaries
Towards objectively evaluating the quality of generated medical summaries
Francesco Moramarco
Damir Juric
Aleksandar Savkov
Ehud Reiter
17
9
0
09 Apr 2021
Shared Task on Evaluating Accuracy in Natural Language Generation
Shared Task on Evaluating Accuracy in Natural Language Generation
Ehud Reiter
Craig Thomson
16
9
0
22 Jun 2020
1