Better Summarization Evaluation with Word Embeddings for ROUGE

25 August 2015

Papers citing "Better Summarization Evaluation with Word Embeddings for ROUGE"

50 / 81 papers shown

Title
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics Silvia Casola Yang Liu Siyao Peng Oliver Kraus Albert Gatt Barbara Plank 27 0 0 17 Jun 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning Joy Lim Jia Yin Daniel Zhang-Li Jifan Yu Haoyang Li Shangqing Tu ... Zhiyuan Liu Huiqin Liu Lei Hou Juanzi Li Bin Xu 83 0 0 04 May 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 110 2 0 21 Mar 2025
ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis Xueshen Li Xinlong Hou Ziyi Huang Yu Gan LM&MA MedIm 98 0 0 15 Feb 2025
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences Genta Indra Winata David Anugraha Lucky Susanto Garry Kuwanto Derry Wijaya 184 11 0 03 Oct 2024
Model-based Preference Optimization in Abstractive Summarization without Human Feedback Jaepill Choi Kyubyung Chae Jiwoo Song Yohan Jo Taesup Kim 68 2 0 27 Sep 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization? Roshan S. Sharma Suwon Shon Mark Lindsey Hira Dhamyal Rita Singh Bhiksha Raj 107 1 0 12 Aug 2024
Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation Congbo Ma Wei Emma Zhang Dileepa Pitawela Haojie Zhuang Yanfeng Shu 58 0 0 16 Jul 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges Jonas Becker Jan Philip Wahle Bela Gipp Terry Ruas 122 11 0 24 May 2024
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Abhinav Agarwalla Abhay Gupta Alexandre Marques Shubhra Pandit Michael Goin ... Tuan Nguyen Mahmoud Salem Dan Alistarh Sean Lie Mark Kurtz MoE SyDa 142 11 0 06 May 2024
ROUGE-K: Do Your Summaries Have Keywords? Sotaro Takeshita Simone Paolo Ponzetto Kai Eckert 73 1 0 08 Mar 2024
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence Yinhong Liu Yixuan Su Ehsan Shareghi Nigel Collier 89 4 0 15 Feb 2024
LUNA: A Framework for Language Understanding and Naturalness Assessment Marat Saidov A. Bakalova Ekaterina Taktasheva Vladislav Mikhailov Ekaterina Artemova ELM 75 2 0 09 Jan 2024
Comparative Experimentation of Accuracy Metrics in Automated Medical Reporting: The Case of Otitis Consultations Wouter Faber Renske Eline Bootsma Tom Huibers S. Dulmen S. Brinkkemper 36 1 0 22 Nov 2023
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey Ashok Urlana Pruthwik Mishra Tathagato Roy Rahul Mishra 78 11 0 15 Nov 2023
Generative Judge for Evaluating Alignment Junlong Li Shichao Sun Weizhe Yuan Run-Ze Fan Hai Zhao Pengfei Liu ELM ALM 119 91 0 09 Oct 2023
Automatic Personalized Impression Generation for PET Reports Using Large Language Models Xin Tie Muheon Shin Ali Pirasteh Nevein Ibrahim Zachary Huemann ... K. M. Kelly John W. Garrett Junjie Hu Steve Y. Cho Tyler Bradshaw LM&MA 122 10 0 18 Sep 2023
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization Mousumi Akter Shubhra (Santu) Karmaker 65 1 0 04 Aug 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types K. Murugesan Sarathkrishna Swaminathan Soham Dan Subhajit Chaudhury Chulaka Gunasekara ... Ibrahim Abdelaziz Achille Fokoue Pavan Kapanipathi Salim Roukos Alexander G. Gray 96 5 0 18 Jun 2023
UMSE: Unified Multi-scenario Summarization Evaluation Shen Gao Zhitao Yao Chongyang Tao Preslav Nakov Fajie Yuan Zhaochun Ren Zhumin Chen 91 5 0 26 May 2023
Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks Xiao Pu Mingqi Gao Xiaojun Wan ELM 93 4 0 24 May 2023
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory Ziang Xiao Susu Zhang Vivian Lai Q. V. Liao ELM 117 30 0 24 May 2023
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method Yiming Wang Zhuosheng Zhang Rui Wang 117 88 0 22 May 2023
On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection Fatma Elsafoury Stamos Katsigiannis 79 1 0 22 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks Anas Himmi Ekhine Irurozki Nathan Noiry Stephan Clémençon Pierre Colombo 196 9 0 17 May 2023
SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism Mehwish Fatima Tim Kolber K. Markert Michael Strube 41 0 0 04 Apr 2023
Lay Text Summarisation Using Natural Language Processing: A Narrative Literature Review Oliver Vinzelberg M. Jenkins Gordon Morison David McMinn Z. Tieges 61 6 0 24 Mar 2023
Curriculum-Guided Abstractive Summarization Sajad Sotudeh Hanieh Deilamsalehy Franck Dernoncourt Nazli Goharian 90 2 0 02 Feb 2023
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding D. Cajueiro A. G. Nery Igor Tavares Maísa Kely de Melo Silvia A. dos Reis Weigang Li V. R. R. Celestino 88 15 0 04 Jan 2023
Towards Abstractive Timeline Summarisation using Preference-based Reinforcement Learning Yuxuan Ye Edwin Simpson 36 0 0 14 Nov 2022
How Far are We from Robust Long Abstractive Summarization? Huan Yee Koh Jiaxin Ju He Zhang Ming Liu Shirui Pan HILM 113 40 0 30 Oct 2022
Towards Interpretable Summary Evaluation via Allocation of Contextual Embeddings to Reference Text Topics Ben Schaper Christopher Lohse Marcell Streile Andrea Giovannini Richard Osuala 52 1 0 25 Oct 2022
DATScore: Evaluating Translation with Data Augmented Translations Moussa Kamal Eddine Guokan Shang Michalis Vazirgiannis 73 5 0 12 Oct 2022
WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs Hoang Thang Ta Abu Bakar Siddiqur Rahman Navonil Majumder Amir Hussain Lotfollah Najjar N. Howard Soujanya Poria Alexander Gelbukh 83 11 0 27 Sep 2022
Text Summarization with Oracle Expectation Yumo Xu Mirella Lapata VLM 68 4 0 26 Sep 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation Pierre Colombo Maxime Peyrard Nathan Noiry Robert West Pablo Piantanida 216 11 0 31 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation Cyril Chhun Pierre Colombo Chloé Clavel Fabian M. Suchanek 191 55 0 24 Aug 2022
Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm Alicia Y. Tsai Laurent El Ghaoui 31 1 0 19 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation Reinald Kim Amplayo Peter J. Liu Yao-Min Zhao Shashi Narayan 79 22 0 01 Aug 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics Huan Yee Koh Jiaxin Ju Ming Liu Shirui Pan 149 128 0 03 Jul 2022
MentSum: A Resource for Exploring Summarization of Mental Health Online Posts Sajad Sotudeh Nazli Goharian Zachary Young AI4MH 71 13 0 02 Jun 2022
A global analysis of metrics used for measuring performance in natural language processing Kathrin Blagec Georg Dorffner M. Moradi Simon Ott Matthias Samwald 95 28 0 25 Apr 2022
Towards Explainable Evaluation Metrics for Natural Language Generation Christoph Leiter Piyawat Lertvittayakumjorn M. Fomicheva Wei Zhao Yang Gao Steffen Eger AAML ELM 76 20 0 21 Mar 2022
What are the best systems? New perspectives on NLP Benchmarking Pierre Colombo Nathan Noiry Ekhine Irurozki Stephan Clémençon 205 42 0 08 Feb 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence Wei Zhao Michael Strube Steffen Eger 121 38 0 26 Jan 2022
WIDAR -- Weighted Input Document Augmented ROUGE Raghav Jain Vaibhav Mavi Anubhav Jangra S. Saha 61 4 0 23 Jan 2022
Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark Naman Bansal Mousumi Akter Shubhra (Santu) Karmaker 84 0 0 14 Jan 2022
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation Pierre Colombo Chloe Clave Pablo Piantanida 137 44 0 02 Dec 2021
Better than Average: Paired Evaluation of NLP Systems Maxime Peyrard Wei Zhao Steffen Eger Robert West ELM 114 26 0 20 Oct 2021
Using Natural Language Processing to Understand Reasons and Motivators Behind Customer Calls in Financial Domain Ankit Patil Ankush Chopra Sohom Ghosh Vamshi Vadla AI4TS 42 1 0 18 Oct 2021