Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.10216
Cited By
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
21 April 2022
Daniel Deutsch
Rotem Dror
Dan Roth
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics"
34 / 34 papers shown
Title
LLMs Do Not See Age: Assessing Demographic Bias in Automated Systematic Review Synthesis
Favour Yahdii Aghaebe
Tanefa Apekey
Elizabeth Williams
Nafise Sadat Moosavi
92
0
0
08 Nov 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
251
4
0
21 Mar 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
197
4
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
International Conference on Learning Representations (ICLR), 2024
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
248
19
0
28 Jan 2025
JuStRank: Benchmarking LLM Judges for System Ranking
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ariel Gera
Odellia Boni
Yotam Perlitz
Roy Bar-Haim
Lilach Eden
Asaf Yehudai
ALM
ELM
421
12
0
12 Dec 2024
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Théo Gigant
Camille Guinaudeau
Marc Decombas
Frédéric Dufaux
213
4
0
08 Oct 2024
How to Train Long-Context Language Models (Effectively)
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
546
87
0
03 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
288
65
0
03 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
229
1
0
29 Sep 2024
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts
Sam Yu-Te Lee
Aryaman Bahukhandi
Dongyu Liu
Kwan-Liu Ma
AAML
211
15
0
16 Jul 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
263
22
0
28 May 2024
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
342
43
0
25 Mar 2024
Multi-Review Fusion-in-Context
Aviv Slobodkin
Ori Shapira
Ran Levy
Ido Dagan
778
1
0
22 Mar 2024
Contextualizing Generated Citation Texts
Biswadip Mandal
Xiangci Li
Jessica Ouyang
130
4
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
716
40
0
28 Feb 2024
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Ankita Gupta
Chulaka Gunasekara
H. Wan
Jatin Ganhotra
Sachindra Joshi
Marina Danilevsky
180
0
0
15 Nov 2023
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Eric Lehman
Noémie Elhadad
206
75
0
08 Sep 2023
Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting
Journal of Biomedical Informatics (JBI), 2023
Yiwen Shi
Ping Ren
Jing Wang
Biao Han
Taha ValizadehAslani
Felix Agbavor
Yi Zhang
Meng Hu
Bo Pan
Hualou Liang
146
22
0
28 Jun 2023
PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer
Xu Han
Bin Guo
Yoon Jung
Benjamin Yao
Yu Zhang
Xiaohu Liu
Chenlei Guo
130
8
0
13 Jun 2023
An Investigation of Evaluation Metrics for Automated Medical Note Generation
Asma Ben Abacha
Wen-wai Yim
George Michalopoulos
Thomas Lin
148
24
0
27 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
276
28
0
23 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
231
4
0
22 May 2023
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
184
20
0
15 May 2023
WangLab at MEDIQA-Chat 2023: Clinical Note Generation from Doctor-Patient Conversations using Large Language Models
Clinical Natural Language Processing Workshop (ClinicalNLP), 2023
John Giorgi
Ziang Ma
Haitao Zhang
Sondra S. Chen
Kevin R. An
Grace X. Zheng
Jun Yin
LM&MA
AI4MH
187
21
0
03 May 2023
Revisiting Automatic Question Summarization Evaluation in the Biomedical Domain
Hongyi Yuan
Yaoyun Zhang
Fei Huang
Songfang Huang
156
1
0
18 Mar 2023
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
John Giorgi
Luca Soldaini
Bo Wang
Gary D. Bader
Kyle Lo
Lucy Lu Wang
Arman Cohan
215
21
0
20 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
213
75
0
19 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
236
153
0
15 Dec 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLM
LRM
270
192
0
15 Dec 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
349
453
0
26 Sep 2022
How to Find Strong Summary Coherence Measures? A Toolbox and a Comparative Study for Summary Coherence Measure Evaluation
International Conference on Computational Linguistics (COLING), 2022
Julius Steen
K. Markert
HILM
94
6
0
14 Sep 2022
TRUE: Re-evaluating Factual Consistency Evaluation
Workshop on Document-grounded Dialogue and Conversational Question Answering (DialDoc), 2022
Or Honovich
Roee Aharoni
Jonathan Herzig
Hagai Taitelbaum
Doron Kukliansy
Vered Cohen
Thomas Scialom
Idan Szpektor
Avinatan Hassidim
Yossi Matias
HILM
227
4
0
11 Apr 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Journal of Artificial Intelligence Research (JAIR), 2022
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
564
217
0
14 Feb 2022
Discourse-Aware Neural Extractive Text Summarization
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Jiacheng Xu
Zhe Gan
Yu Cheng
Jingjing Liu
BDL
287
289
0
30 Oct 2019
1