ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.10216
  4. Cited By
Re-Examining System-Level Correlations of Automatic Summarization
  Evaluation Metrics

Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics

North American Chapter of the Association for Computational Linguistics (NAACL), 2022
21 April 2022
Daniel Deutsch
Rotem Dror
Dan Roth
ArXiv (abs)PDFHTML

Papers citing "Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics"

34 / 34 papers shown
Title
LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis
LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis
Favour Yahdii Aghaebe
Tanefa Apekey
Elizabeth Williams
Nafise Sadat Moosavi
92
0
0
08 Nov 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
251
4
0
21 Mar 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-EvaluationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
197
4
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-JudgeInternational Conference on Learning Representations (ICLR), 2024
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
248
19
0
28 Jan 2025
JuStRank: Benchmarking LLM Judges for System Ranking
JuStRank: Benchmarking LLM Judges for System RankingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ariel Gera
Odellia Boni
Yotam Perlitz
Roy Bar-Haim
Lilach Eden
Asaf Yehudai
ALMELM
421
12
0
12 Dec 2024
Mitigating the Impact of Reference Quality on Evaluation of
  Summarization Systems with Reference-Free Metrics
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free MetricsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Théo Gigant
Camille Guinaudeau
Marc Decombas
Frédéric Dufaux
213
4
0
08 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
546
87
0
03 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALMELM
288
65
0
03 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
A Critical Look at Meta-evaluating Summarisation Evaluation MetricsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
229
1
0
29 Sep 2024
Towards Dataset-scale and Feature-oriented Evaluation of Text
  Summarization in Large Language Model Prompts
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts
Sam Yu-Te Lee
Aryaman Bahukhandi
Dongyu Liu
Kwan-Liu Ma
AAML
211
15
0
16 Jul 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation
  for Generative Large Language Models
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
263
22
0
28 May 2024
Attribute First, then Generate: Locally-attributable Grounded Text
  Generation
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
342
43
0
25 Mar 2024
Multi-Review Fusion-in-Context
Multi-Review Fusion-in-Context
Aviv Slobodkin
Ori Shapira
Ran Levy
Ido Dagan
778
1
0
22 Mar 2024
Contextualizing Generated Citation Texts
Contextualizing Generated Citation Texts
Biswadip Mandal
Xiangci Li
Jessica Ouyang
130
4
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
716
40
0
28 Feb 2024
Evaluating Robustness of Dialogue Summarization Models in the Presence
  of Naturally Occurring Variations
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Ankita Gupta
Chulaka Gunasekara
H. Wan
Jatin Ganhotra
Sachindra Joshi
Marina Danilevsky
180
0
0
15 Nov 2023
From Sparse to Dense: GPT-4 Summarization with Chain of Density
  Prompting
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Eric Lehman
Noémie Elhadad
206
75
0
08 Sep 2023
Leveraging GPT-4 for Food Effect Summarization to Enhance
  Product-Specific Guidance Development via Iterative Prompting
Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative PromptingJournal of Biomedical Informatics (JBI), 2023
Yiwen Shi
Ping Ren
Jing Wang
Biao Han
Taha ValizadehAslani
Felix Agbavor
Yi Zhang
Meng Hu
Bo Pan
Hualou Liang
146
22
0
28 Jun 2023
PersonaPKT: Building Personalized Dialogue Agents via
  Parameter-efficient Knowledge Transfer
PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Xu Han
Bin Guo
Yoon Jung
Benjamin Yao
Yu Zhang
Xiaohu Liu
Chenlei Guo
130
8
0
13 Jun 2023
An Investigation of Evaluation Metrics for Automated Medical Note
  Generation
An Investigation of Evaluation Metrics for Automated Medical Note Generation
Asma Ben Abacha
Wen-wai Yim
George Michalopoulos
Thomas Lin
148
24
0
27 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with
  Human Evaluations
Automated Metrics for Medical Multi-Document Summarization Disagree with Human EvaluationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
276
28
0
23 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
231
4
0
22 May 2023
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
  Measurements of Performance
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of PerformanceAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
184
20
0
15 May 2023
WangLab at MEDIQA-Chat 2023: Clinical Note Generation from
  Doctor-Patient Conversations using Large Language Models
WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language ModelsClinical Natural Language Processing Workshop (ClinicalNLP), 2023
John Giorgi
Ziang Ma
Haitao Zhang
Sondra S. Chen
Kevin R. An
Grace X. Zheng
Jun Yin
LM&MAAI4MH
187
21
0
03 May 2023
Revisiting Automatic Question Summarization Evaluation in the Biomedical
  Domain
Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain
Hongyi Yuan
Yaoyun Zhang
Fei Huang
Songfang Huang
156
1
0
18 Mar 2023
Open Domain Multi-document Summarization: A Comprehensive Study of Model
  Brittleness under Retrieval
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
John Giorgi
Luca Soldaini
Bo Wang
Gary D. Bader
Kyle Lo
Lucy Lu Wang
Arman Cohan
215
21
0
20 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
LENS: A Learnable Evaluation Metric for Text SimplificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
213
75
0
19 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with
  Robust Human Evaluation
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
236
153
0
15 Dec 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLMLRM
270
192
0
15 Dec 2022
News Summarization and Evaluation in the Era of GPT-3
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
349
453
0
26 Sep 2022
How to Find Strong Summary Coherence Measures? A Toolbox and a
  Comparative Study for Summary Coherence Measure Evaluation
How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure EvaluationInternational Conference on Computational Linguistics (COLING), 2022
Julius Steen
K. Markert
HILM
94
6
0
14 Sep 2022
TRUE: Re-evaluating Factual Consistency Evaluation
TRUE: Re-evaluating Factual Consistency EvaluationWorkshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2022
Or Honovich
Roee Aharoni
Jonathan Herzig
Hagai Taitelbaum
Doron Kukliansy
Vered Cohen
Thomas Scialom
Idan Szpektor
Avinatan Hassidim
Yossi Matias
HILM
227
4
0
11 Apr 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated TextJournal of Artificial Intelligence Research (JAIR), 2022
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELMAI4CE
564
217
0
14 Feb 2022
Discourse-Aware Neural Extractive Text Summarization
Discourse-Aware Neural Extractive Text SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Jiacheng Xu
Zhe Gan
Yu Cheng
Jingjing Liu
BDL
287
289
0
30 Oct 2019
1