Title
Evaluating the Evaluators: Are readability metrics good measures of readability? Isabel Cachola Daniel Khashabi Mark Dredze ELM 8 0 0 26 Aug 2025
From Sound to Sight: Towards AI-authored Music Videos Leo Vitasovic Stella Graßhof Agnes Mercedes Kloft Ville V. Lehtola Martin Cunneen Justyna Starostka Glenn McGarry Kun Li Sami S. Brandt VGen 4 0 0 20 Aug 2025
CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset Jindrich Libovický Jindřich Helcl Andrei-Alexandru Manea Gianluca Vico 48 0 0 30 Jul 2025
What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization Weixiao Zhou Junnan Zhu Gengyao Li Xianfu Cheng Xinnian Liang Feifei Zhai Zhiyu Li ALM 125 0 0 18 May 2025
LLMs Get Lost In Multi-Turn Conversation Philippe Laban Hiroaki Hayashi Yingbo Zhou Jennifer Neville 184 33 0 09 May 2025
PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents Takyoung Kim Janvijay Singh Shuhaib Mehri Emre Can Acikgoz Sagnik Mukherjee Nimet Beyza Bozdag Sumuk Shashidhar Gokhan Tur Dilek Hakkani-Tur LLMAG 110 1 0 02 May 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation Xiuying Chen Tairan Wang Juexiao Zhou Zirui Song Xin Gao Wei Wei MedIm 119 4 0 24 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization Adithya Pratapa Teruko Mitamura RALM 116 0 0 17 Apr 2025
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA Xanh Ho Jiahao Huang Florian Boudin Akiko Aizawa ELM 217 4 0 16 Apr 2025
Summarizing Speech: A Comprehensive Survey Fabian Retkowski Maike Züfle Andreas Sudmann Dinah Pfau Jan Niehues Alexander Waibel Alexander H. Waibel 165 0 0 10 Apr 2025
PreSumm: Predicting Summarization Performance Without Summarizing Steven Koniaev Ori Ernst Jackie Chi Kit Cheung 110 0 0 07 Apr 2025
Fwd2Bot: LVLM Visual Token Compression with Double Forward Bottleneck Adrian Bulat Yassine Ouali Georgios Tzimiropoulos 564 0 0 27 Mar 2025
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis Raúl Ortega José Manuel Gómez-Pérez 160 1 0 24 Mar 2025
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment Varsha Embar Ritvik Shrivastava Vinay Damodaran Travis Mehlinger Yu-Chung Hsiao Karthik Raghunathan 86 0 0 24 Mar 2025
GINGER: Grounded Information Nugget-Based Generation of Responses Weronika Łajewska K. Balog 101 3 0 23 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing Juntai Cao Xiang Zhang Raymond Li Chuyuan Li Shafiq Joty Shafiq Joty Giuseppe Carenini 259 6 0 27 Feb 2025
BRIDO: Bringing Democratic Order to Abstractive Summarization Junhyun Lee Harshith Goka Hyeonmok Ko HILM 113 0 0 25 Feb 2025
Evaluating the Effectiveness of Large Language Models in Automated News Article Summarization Lionel Richy Panlap Houamegni Fatih Gedikli 89 2 0 24 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation SeongYeub Chu JongWoo Kim MunYong Yi 190 10 0 21 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches Adithya Pratapa Teruko Mitamura 163 1 0 10 Feb 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge Aparna Elangovan Jongwoo Ko Lei Xu Mahsa Elyasi Ling Liu S. Bodapati Dan Roth 172 12 0 28 Jan 2025
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization Shiyue Zhang David Wan Arie Cattan Ayal Klein Ido Dagan Joey Tianyi Zhou 188 0 0 10 Dec 2024
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown Lifu Tu Rui Meng Shafiq Joty Yingbo Zhou Semih Yavuz HILM 164 1 0 24 Nov 2024
SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers Shruti Singh Nandan Sarkar Arman Cohan 120 5 0 08 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization David Wan Jesse Vig Joey Tianyi Zhou Shafiq Joty HILM 145 10 0 31 Oct 2024
Optimizing the role of human evaluation in LLM-based spoken document summarization systems Margaret Kroll Kelsey Kraus 35 2 0 23 Oct 2024
DiscoGraMS: Enhancing Movie Screen-Play Summarization using Movie Character-Aware Discourse Graph Maitreya Prafulla Chitale Uday Bindal Rajakrishnan Rajkumar Rahul Mishra 158 1 0 18 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization Catarina G. Belem Pouya Pezeskhpour Hayate Iso Seiji Maekawa Nikita Bhutani Estevam R. Hruschka HILM 188 7 0 17 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation Yixin Liu Kejian Shi Alexander R. Fabbri Yilun Zhao Peifeng Wang Chien-Sheng Wu Shafiq Joty Arman Cohan 102 8 0 09 Oct 2024
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics Théo Gigant Camille Guinaudeau Marc Decombas Frédéric Dufaux 105 1 0 08 Oct 2024
Salient Information Prompting to Steer Content in Prompt-based Abstractive Summarization Lei Xu Mohammed Asad Karim Saket Dingliwal Aparna Elangovan 66 0 0 03 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics Xiang Dai Sarvnaz Karimi Biaoyan Fang 101 0 0 29 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty Lin Ai Ziwei Gong Harshsaiprasad Deshpande Alexander Johnson Emmy Phung Ahmad Emami Julia Hirschberg 58 1 0 14 Sep 2024
When Context Leads but Parametric Memory Follows in Large Language Models Yufei Tao Adam Hiatt Erik Haake Antonie J. Jetter Ameeta Agrawal KELM 124 3 0 13 Sep 2024
Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy Priyanka Mandikal RALM VLM 92 1 0 21 Aug 2024
Localizing and Mitigating Errors in Long-form Question Answering Rachneet Sachdeva Yixiao Song Mohit Iyyer Iryna Gurevych HILM 155 1 0 16 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Philippe Laban Alexander R. Fabbri Caiming Xiong Chien-Sheng Wu RALM 167 69 0 01 Jul 2024
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification Anisha Gunjal Greg Durrett HILM 151 28 0 28 Jun 2024
Scalable and Domain-General Abstractive Proposition Segmentation Mohammad Javad Hosseini Yang Gao Tim Baumgärtner Alex Fabrikant Reinald Kim Amplayo 110 0 0 28 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation Christoph Leiter Steffen Eger 117 11 0 26 Jun 2024
PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection Jooyoung Lee Toshini Agrawal Adaku Uchendu Thai V. Le Jinghui Chen Dongwon Lee 252 2 0 24 Jun 2024
Verifiable Generation with Subsentence-Level Fine-Grained Citations Shuyang Cao Lu Wang 126 8 0 10 Jun 2024
Flexible and Adaptable Summarization via Expertise Separation Preslav Nakov Mingzhe Li Shen Gao Xin Cheng Qingqing Zhu Rui Yan Xin Gao Xiangliang Zhang MoE 114 7 0 08 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond Pengyuan Lyu Yulin Li Hao Zhou Weihong Ma Xingyu Wan ... Liang Wu Chengquan Zhang Kun Yao Errui Ding Jingdong Wang 102 9 0 31 May 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Minghan Li Xilun Chen Ari Holtzman Beidi Chen Jimmy Lin Wen-tau Yih Xi Lin RALM BDL 342 19 0 29 May 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models Aparna Elangovan Ling Liu Lei Xu S. Bodapati Dan Roth ELM 129 12 0 28 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering Minbyul Jeong Hyeon Hwang Chanwoong Yoon Taewhoo Lee Jaewoo Kang MedIm HILM LM&MA 158 14 0 21 May 2024
Large Language Models are Inconsistent and Biased Evaluators Rickard Stureborg Dimitris Alikaniotis Yoshi Suhara ALM 162 80 0 02 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models Sheng-Chieh Lin Luyu Gao Barlas Oğuz Wenhan Xiong Jimmy Lin Wen-tau Yih Xilun Chen HILM 103 29 0 02 May 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document Joonho Yang Seunghyun Yoon Byeongjeong Kim Hwanhee Lee HILM 140 7 0 17 Apr 2024