ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.12437
  4. Cited By
The statistical advantage of automatic NLG metrics at the system level

The statistical advantage of automatic NLG metrics at the system level

26 May 2021
Johnny Tian-Zheng Wei
Robin Jia
ArXiv (abs)PDFHTMLGithub (4★)

Papers citing "The statistical advantage of automatic NLG metrics at the system level"

15 / 15 papers shown
Title
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
78
2
0
28 Jan 2025
QAPyramid: Fine-grained Evaluation of Content Selection for Text
  Summarization
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
126
0
0
10 Dec 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for
  Generative AI Evaluation
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
63
2
0
03 Jun 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELMALM
198
63
3
23 May 2024
AutoEval Done Right: Using Synthetic Data for Model Evaluation
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
SyDa
102
22
0
09 Mar 2024
Benchmarking Large Language Model Capabilities for Conditional
  Generation
Benchmarking Large Language Model Capabilities for Conditional Generation
Joshua Maynez
Priyanka Agrawal
Sebastian Gehrmann
ELMLM&MA
92
31
0
29 Jun 2023
Correction of Errors in Preference Ratings from Automated Metrics for
  Text Generation
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation
Jan Deriu
Pius von Daniken
Don Tuggener
Mark Cieliebak
70
2
0
06 Jun 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form
  Summarization
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
67
96
0
30 Jan 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with
  Robust Human Evaluation
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
86
134
0
15 Dec 2022
Searching for a higher power in the human evaluation of MT
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
50
6
0
20 Oct 2022
Re-Examining System-Level Correlations of Automatic Summarization
  Evaluation Metrics
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Daniel Deutsch
Rotem Dror
Dan Roth
75
45
0
21 Apr 2022
Toward More Effective Human Evaluation for Machine Translation
Toward More Effective Human Evaluation for Machine Translation
Belén Saldías
George F. Foster
Markus Freitag
Qijun Tan
59
11
0
11 Apr 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELMAI4CE
157
193
0
14 Feb 2022
Using Sampling to Estimate and Improve Performance of Automated Scoring
  Systems with Guarantees
Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees
Yaman Kumar Singla
Sriram Krishna
R. Shah
Changyou Chen
75
7
0
17 Nov 2021
Incorporating Question Answering-Based Signals into Abstractive
  Summarization via Salient Span Selection
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
Daniel Deutsch
Dan Roth
88
6
0
15 Nov 2021
1