AlignScore: Evaluating Factual Consistency with a Unified Alignment Function

Annual Meeting of the Association for Computational Linguistics (ACL), 2023

26 May 2023

ArXiv (abs)PDF HTML Github (1885★)

Papers citing "AlignScore: Evaluating Factual Consistency with a Unified Alignment Function"

50 / 184 papers shown

Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation

218

20 Nov 2025

HEDGE: Hallucination Estimation via Dense Geometric Entropy for VQA with Vision-Language Models

254

16 Nov 2025

SynClaimEval: A Framework for Evaluating the Utility of Synthetic Data in Long-Context Claim Verification

Mohamed Elaraby

Jyoti Prakash Maheswari

SyDa

144

12 Nov 2025

Stress Testing Factual Consistency Metrics for Long-Document Summarization

Zain Muhammad Mujahid

Dustin Wright

Isabelle Augenstein

HILM

260

10 Nov 2025

VISTA: Verification In Sequential Turn-based Assessment

346

30 Oct 2025

Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation

178

28 Oct 2025

Designing and Evaluating Chain-of-Hints for Scientific Question Answering

Anubhav Jangra

Smaranda Muresan

AI4Ed ELM

378

24 Oct 2025

ECG-LLM-- training and evaluation of domain-specific large language models for electrocardiography

Lara Ahrens

Wilhelm Haverkamp

Nils Strodthoff

198

21 Oct 2025

Disparities in Multilingual LLM-Based Healthcare Q&A

167

20 Oct 2025

A Multilingual, Large-Scale Study of the Interplay between LLM Safeguards, Personalisation, and Disinformation

260

14 Oct 2025

Enhancing Faithfulness in Abstractive Summarization via Span-Level Fine-Tuning

205

10 Oct 2025

LeMAJ (Legal LLM-as-a-Judge): Bridging Legal Reasoning and LLM Evaluation

Arijit Ghosh Chowdhury

...

Jeremy Roghair

Hannah R Marlowe

Carina Suzana Negreanu

Kitty Boxall

Diana Mincu

AILaw ELM

214

08 Oct 2025

Exposing Citation Vulnerabilities in Generative Engines

220

08 Oct 2025

Text2Stories: Evaluating the Alignment Between Stakeholder Interviews and Generated User Stories

Francesco Dente

Fabiano Dalpiaz

Paolo Papotti

08 Oct 2025

Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

Elle

ALM

210

07 Oct 2025

Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation

266

02 Oct 2025

Automated Evaluation can Distinguish the Good and Bad AI Responses to Patient Questions about Hospitalization

Sarvesh Soni

Dina Demner-Fushman

AI4MH

251

01 Oct 2025

Copy-Paste to Mitigate Large Language Model Hallucinations

199

01 Oct 2025

ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment

198

30 Sep 2025

Multidimensional Uncertainty Quantification via Optimal Transport

233

26 Sep 2025

Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models

Wataru Hashimoto

Hidetaka Kamigaito

Taro Watanabe

213

20 Sep 2025

Pluralistic Off-policy Evaluation and Alignment

221

15 Sep 2025

Automated Evidence Extraction and Scoring for Corporate Climate Policy Engagement: A Multilingual RAG Approach

Imene Kolli

Ario Saeid Vaghefi

Chiara Colesanti-Senni

Shantam Raj

Markus Leippold

10 Sep 2025

CoCoA: Confidence and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models

Anant Khandelwal

Manish Gupta

Puneet Agrawal

268

25 Aug 2025

MMCIG: Multimodal Cover Image Generation for Text-only Documents and Its Dataset Construction via Pseudo-labeling

137

24 Aug 2025

If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition

Shubhashis Roy Dipta

Francis Ferraro

AAML

206

22 Aug 2025

Expert Preference-based Evaluation of Automated Related Work Generation

Furkan Şahinuç

Subhabrata Dutta

Iryna Gurevych

159

11 Aug 2025

CoCoLex: Confidence-guided Copy-based Decoding for Grounded Legal Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Santosh T.Y.S.S

Youssef Tarek Elkhayat

169

07 Aug 2025

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

325

01 Aug 2025

Hallucination Detection and Mitigation with Diffusion in Multi-Variate Time-Series Foundation Models

243

23 Jul 2025

ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

405

22 Jul 2025

KEA Explain: Explanations of Hallucinations using Graph Kernel Analysis

Reilly Haskins

Benjamin Adams

208

05 Jul 2025

MedVAL: Toward Expert-Level Medical Text Validation with Language Models

...

411

03 Jul 2025

Reranking-based Generation for Unbiased Perspective SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

204

19 Jun 2025

Re-Initialization Token Learning for Tool-Augmented Large Language Models

188

17 Jun 2025

Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering

Sai Prasanna Teja Reddy Bogireddy

Abrar Majeedi

Viswanatha Reddy Gajjala

Zhuoyan Xu

Siddhant Rai

Vaishnav Potlapalli

355

12 Jun 2025

CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection

391

05 Jun 2025

A Dataset for Addressing Patient's Information Needs related to Clinical Course of Hospitalization

Sarvesh Soni

Dina Demner-Fushman

326

04 Jun 2025

QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

313

04 Jun 2025

Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations

299

03 Jun 2025

Towards Multi-dimensional Evaluation of LLM Summarization across Domains and LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

305

31 May 2025

Faithfulness-Aware Uncertainty Quantification for Fact-Checking the Output of Retrieval Augmented Generation

Ekaterina Fadeeva

Aleksandr Rubashevskii

390

27 May 2025

VeriTrail: Closed-Domain Hallucination Detection with Traceability

Dasha Metropolitansky

Jonathan Larson

HILM

346

27 May 2025

Uncertainty-Aware Attention Heads: Efficient Unsupervised Uncertainty Quantification for LLMs

Artem Vazhentsev

Abdelrahman Boda Sadallah

...

561

26 May 2025

Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat

310

26 May 2025

UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models

282

25 May 2025

Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps

Khandakar Ashrafi Akbar

461

23 May 2025

Long-Form Information Alignment Evaluation Beyond Atomic Facts

310

21 May 2025

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

...

664

19 May 2025

What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization

447

18 May 2025