Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

4 September 2019

Papers citing "Answers Unite! Unsupervised Metrics for Reinforced Summarization Models"

36 / 36 papers shown

Title
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 59 1 0 21 Mar 2025
SteLLA: A Structured Grading System Using LLMs with RAG Hefei Qiu Brian White Ashley Ding Reinaldo Costa Ali Hachem Wei Ding Ping Chen AI4Ed 56 0 0 17 Jan 2025
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences Genta Indra Winata David Anugraha Lucky Susanto Garry Kuwanto Derry Wijaya 37 7 0 03 Oct 2024
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization Yuchen Shen Xiaojun Wan 25 9 0 27 Oct 2023
UMSE: Unified Multi-scenario Summarization Evaluation Shen Gao Zhitao Yao Chongyang Tao Xiuying Chen Pengjie Ren Z. Ren Zhumin Chen 30 5 0 26 May 2023
Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization Rongxin Zhu Jianzhong Qi Jey Han Lau 31 9 0 26 May 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu Alexander R. Fabbri Pengfei Liu Yilun Zhao Linyong Nan ... Simeng Han Shafiq R. Joty Chien-Sheng Wu Caiming Xiong Dragomir R. Radev ALM 10 132 0 15 Dec 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question Alireza Mohammadshahi Thomas Scialom Majid Yazdani Pouya Yanki Angela Fan James Henderson Marzieh Saeidi 26 20 0 02 Nov 2022
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive Summarization Yiyang Li Lei Li Marina Litvak N. Vanetik Dingxing Hu Yuze Li Yanquan Zhou HILM 32 0 0 06 Oct 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation Cyril Chhun Pierre Colombo Chloé Clavel Fabian M. Suchanek 53 50 0 24 Aug 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics Huan Yee Koh Jiaxin Ju Ming Liu Shirui Pan 73 122 0 03 Jul 2022
Conditional Generation with a Question-Answering Blueprint Shashi Narayan Joshua Maynez Reinald Kim Amplayo Kuzman Ganchev Annie Louis Fantine Huot Anders Sandholm Dipanjan Das Mirella Lapata 54 47 0 01 Jul 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code Daniel Deutsch Dan Roth AI4CE 37 2 0 29 Apr 2022
Evaluation of Automatic Text Summarization using Synthetic Facts J. Ahn Foaad Khosmood HILM 11 0 0 11 Apr 2022
Recursively Summarizing Books with Human Feedback Jeff Wu Long Ouyang Daniel M. Ziegler Nissan Stiennon Ryan J. Lowe Jan Leike Paul Christiano ALM 23 294 0 22 Sep 2021
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries Xiangru Tang Alexander R. Fabbri Haoran Li Ziming Mao Griffin Adams Borui Wang Asli Celikyilmaz Yashar Mehdad Dragomir R. Radev HILM 13 19 0 19 Sep 2021
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation Mingkai Deng Bowen Tan Zhengzhong Liu Eric P. Xing Zhiting Hu 16 72 0 14 Sep 2021
Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation Yuexiang Xie Fei Sun Yang Deng Yaliang Li Bolin Ding HILM 10 53 0 30 Aug 2021
QACE: Asking Questions to Evaluate an Image Caption Hwanhee Lee Thomas Scialom Seunghyun Yoon Franck Dernoncourt Kyomin Jung CoGe 17 18 0 28 Aug 2021
BookSum: A Collection of Datasets for Long-form Narrative Summarization Wojciech Kry'sciñski Nazneen Rajani Divyansh Agarwal Caiming Xiong Dragomir R. Radev RALM 19 145 0 18 May 2021
Towards Human-Free Automatic Quality Evaluation of German Summarization Neslihan Iskender Oleg V. Vasilyev Tim Polzehl John Bohannon Sebastian Möller 21 1 0 13 May 2021
The Summary Loop: Learning to Write Abstractive Summaries Without Examples Philippe Laban Andrew Hsi Bloomberg John F. Canny Marti A. Hearst 17 56 0 11 May 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation Tianyu Liu Yizhe Zhang Chris Brockett Yi Mao Zhifang Sui Weizhu Chen W. Dolan HILM 219 143 0 18 Apr 2021
What's in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization Griffin Adams Emily Alsentzer Mert Ketenci Jason Zucker Noémie Elhadad 35 46 0 12 Apr 2021
QuestEval: Summarization Asks for Fact-based Evaluation Thomas Scialom Paul-Alexis Dray Patrick Gallinari Sylvain Lamprier Benjamin Piwowarski Jacopo Staiano Alex Jinpeng Wang HILM 11 267 0 23 Mar 2021
Towards Faithfulness in Open Domain Table-to-text Generation from an Entity-centric View Tianyu Liu Xin Zheng Baobao Chang Zhifang Sui 119 35 0 17 Feb 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Sebastian Gehrmann Tosin P. Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo ... Nishant Subramani Wei-ping Xu Diyi Yang Akhila Yerukola Jiawei Zhou VLM 248 285 0 02 Feb 2021
PARENTing via Model-Agnostic Reinforcement Learning to Correct Pathological Behaviors in Data-to-Text Generation Clément Rebuffel Laure Soulier Geoffrey Scoutheeten Patrick Gallinari 6 9 0 21 Oct 2020
SummEval: Re-evaluating Summarization Evaluation Alexander R. Fabbri Wojciech Kry'sciñski Bryan McCann Caiming Xiong R. Socher Dragomir R. Radev HILM 38 687 0 24 Jul 2020
SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling F. S. Bao Hebi Li Ge Luo Minghui Qiu Yinfei Yang Youbiao He Cen Chen 16 4 0 13 May 2020
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization Yang Gao Wei-Ye Zhao Steffen Eger ELM 16 124 0 07 May 2020
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward Luyang Huang Lingfei Wu Lu Wang RALM 24 161 0 03 May 2020
MLSUM: The Multilingual Summarization Corpus Thomas Scialom Paul-Alexis Dray Sylvain Lamprier Benjamin Piwowarski Jacopo Staiano 17 172 0 30 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries Alex Jinpeng Wang Kyunghyun Cho M. Lewis HILM 10 468 0 08 Apr 2020
Fill in the BLANC: Human-free quality estimation of document summaries Oleg V. Vasilyev Vedant Dharnidharka John Bohannon 3DH 31 116 0 23 Feb 2020
CTRL: A Conditional Transformer Language Model for Controllable Generation N. Keskar Bryan McCann L. Varshney Caiming Xiong R. Socher AI4CE 52 1,232 0 11 Sep 2019