Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.12437
Cited By
The statistical advantage of automatic NLG metrics at the system level
26 May 2021
Johnny Tian-Zheng Wei
Robin Jia
Re-assign community
ArXiv (abs)
PDF
HTML
Github (4★)
Papers citing
"The statistical advantage of automatic NLG metrics at the system level"
15 / 15 papers shown
Title
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
78
2
0
28 Jan 2025
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
126
0
0
10 Dec 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
63
2
0
03 Jun 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
198
63
3
23 May 2024
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
SyDa
102
22
0
09 Mar 2024
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELM
LM&MA
92
31
0
29 Jun 2023
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation
Jan Deriu
Pius von Daniken
Don Tuggener
Mark Cieliebak
70
2
0
06 Jun 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
67
96
0
30 Jan 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
86
134
0
15 Dec 2022
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
50
6
0
20 Oct 2022
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Daniel Deutsch
Rotem Dror
Dan Roth
75
45
0
21 Apr 2022
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
59
11
0
11 Apr 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
157
193
0
14 Feb 2022
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
75
7
0
17 Nov 2021
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
Daniel Deutsch
Dan Roth
88
6
0
15 Nov 2021
1