Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.12626
Cited By
SummEval: Re-evaluating Summarization Evaluation
24 July 2020
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SummEval: Re-evaluating Summarization Evaluation"
50 / 118 papers shown
Title
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
35
0
0
04 May 2025
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Anum Afzal
Alexandre Mercier
Florian Matthes
55
0
0
29 Apr 2025
Towards Long Context Hallucination Detection
Siyi Liu
Kishaloy Halder
Zheng Qi
Wei Xiao
Nikolaos Pappas
Phu Mon Htut
Neha Anna John
Yassine Benajiba
Dan Roth
HILM
73
0
0
28 Apr 2025
TextTIGER: Text-based Intelligent Generation with Entity Prompt Refinement for Text-to-Image Generation
Shintaro Ozaki
Kazuki Hayashi
Yusuke Sakai
Jingun Kwon
Hidetaka Kamigaito
Katsuhiko Hayashi
Manabu Okumura
Taro Watanabe
VLM
81
0
0
25 Apr 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
56
2
0
30 Mar 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
54
1
0
21 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
79
1
0
07 Mar 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
55
1
0
21 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
64
1
0
02 Feb 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
41
5
0
28 Jan 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Y. Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
43
0
0
28 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
71
1
0
20 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
67
0
20 Jan 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang
Yuchang Su
Yiming Liu
Xiaohan Wang
James Burgess
...
Josiah Aklilu
Alejandro Lozano
Anjiang Wei
Ludwig Schmidt
Serena Yeung-Levy
50
3
0
06 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
52
96
0
03 Jan 2025
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
58
3
0
30 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
65
1
0
17 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
41
1
0
14 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
26
4
0
07 Oct 2024
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
Han He
Qianchu Liu
Lei Xu
Chaitanya P. Shivade
Yi Zhang
S. Srinivasan
Katrin Kirchhoff
26
1
0
03 Oct 2024
From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication
Jeremy E. Block
Donald R. Honeycutt
Brett Benda
Benjamin Rheault
Eric D. Ragan
25
1
0
06 Sep 2024
DHP Benchmark: Are LLMs Good NLG Evaluators?
Yicheng Wang
Jiayi Yuan
Yu-Neng Chuang
Zhuoer Wang
Yingchi Liu
Mark Cusick
Param Kulkarni
Zhengping Ji
Yasser Ibrahim
Xia Hu
LM&MA
ELM
43
3
0
25 Aug 2024
Check-Eval: A Checklist-based Approach for Evaluating Text Quality
J. Pereira
R.A. Lotufo
ELM
31
6
0
19 Jul 2024
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs
Mihir Parmar
Hanieh Deilamsalehy
Franck Dernoncourt
Seunghyun Yoon
Ryan A. Rossi
Trung Bui
32
2
0
05 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
26
29
0
01 Jul 2024
PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection
Jooyoung Lee
Toshini Agrawal
Adaku Uchendu
Thai V. Le
Jinghui Chen
Dongwon Lee
23
1
0
24 Jun 2024
TOPICAL: TOPIC Pages AutomagicaLly
John Giorgi
Amanpreet Singh
Doug Downey
Sergey Feldman
Lucy Lu Wang
MedIm
32
0
0
03 May 2024
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models
Yukyung Lee
Soonwon Ka
Bokyung Son
Pilsung Kang
Jaewook Kang
LLMAG
47
6
0
22 Apr 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Xiuying Chen
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
70
2
0
22 Feb 2024
Event-Keyed Summarization
William Gantt
Alexander Martin
Pavlo Kuchmiichuk
Aaron Steven White
22
1
0
10 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
29
0
02 Feb 2024
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
Marcio Fonseca
Shay B. Cohen
39
10
0
18 Jan 2024
INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges
J. Pereira
Andre Assumpcao
J. Trecenti
Luiz Airosa
Caio Lente
Jhonatan Cléto
Guilherme Dobins
Rodrigo Nogueira
Luis Mitchell
R. Lotufo
28
2
0
10 Jan 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
32
24
0
03 Jan 2024
Action-Item-Driven Summarization of Long Meeting Transcripts
Logan Golia
Jugal Kalita
18
1
0
29 Dec 2023
Responsible AI Considerations in Text Summarization Research: A Review of Current Practices
Yu Lu Liu
Meng Cao
Su Lin Blodgett
Jackie Chi Kit Cheung
Alexandra Olteanu
Adam Trischler
21
1
0
18 Nov 2023
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Minqian Liu
Ying Shen
Zhiyang Xu
Yixin Cao
Eunah Cho
Vaibhav Kumar
Reza Ghanadan
Lifu Huang
ELM
LM&MA
ALM
41
25
0
15 Nov 2023
Evaluating Generative Ad Hoc Information Retrieval
Lukas Gienapp
Harrisen Scells
Niklas Deckers
Janek Bevendorff
Shuai Wang
...
Maik Frobe
Guide Zucoon
Benno Stein
Matthias Hagen
Martin Potthast
RALM
30
11
0
08 Nov 2023
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization
Sian Gooding
Hassan Mansoor
10
1
0
02 Nov 2023
Language Models Hallucinate, but May Excel at Fact Verification
Jian-Yu Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
22
28
0
23 Oct 2023
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
Andrea Sottana
Bin Liang
Kai Zou
Zheng Yuan
ALM
ELM
LM&MA
25
54
0
20 Oct 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Pei Ke
Fei Huang
Fei Mi
Yasheng Wang
Qun Liu
Xiaoyan Zhu
Minlie Huang
ReLM
ELM
34
10
0
13 Jul 2023
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap
Q. V. Liao
Ziang Xiao
ALM
ELM
43
28
0
01 Jun 2023
Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization
Rongxin Zhu
Jianzhong Qi
Jey Han Lau
28
9
0
26 May 2023
Evaluating Factual Consistency of Summaries with Large Language Models
Shiqi Chen
Siyang Gao
Junxian He
ELM
LRM
HILM
19
6
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
55
8
0
23 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
19
5
0
17 May 2023
What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization
Griffin Adams
Bichlien H. Nguyen
Jake A. Smith
Yingce Xia
Shufang Xie
Anna Ostropolets
Budhaditya Deb
Yuan Chen
Tristan Naumann
Noémie Elhadad
22
8
0
12 May 2023
1
2
3
Next