v1v2v3v4v5 (latest)

BLEURT: Learning Robust Metrics for Text Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020

9 April 2020

Papers citing "BLEURT: Learning Robust Metrics for Text Generation"

50 / 1,045 papers shown

SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification

181

02 Dec 2025

Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding

115

01 Dec 2025

HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment

Valentin Noël

Elimane Yassine Seidou

Charly Ken Capo-Chichi

Ghanem Amari

HILM

201

01 Dec 2025

A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction

Farzad Ahmed

Joniel Augustine Jerome

Meliha Yetisgen

Özlem Uzuner

186

25 Nov 2025

ARQUSUMM: Argument-aware Quantitative Summarization of Online Conversations

127

21 Nov 2025

SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation

251

21 Nov 2025

WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue

329

20 Nov 2025

Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation

218

20 Nov 2025

Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models

Sushant Mehta

162

18 Nov 2025

Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering

Sai Shridhar Balamurali

Lu Cheng

190

10 Nov 2025

VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models

280

10 Nov 2025

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

234

05 Nov 2025

AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs

Mo El-Haj

Paul Rayson

AIFin

499

03 Nov 2025

MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts

145

01 Nov 2025

Rating Roulette: Self-Inconsistency in LLM-As-A-Judge FrameworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Rajarshi Haldar

Julia Hockenmaier

203

31 Oct 2025

Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media

302

29 Oct 2025

A Critical Study of Automatic Evaluation in Sign Language Translation

Shakib Yazdani

Yasser Hamidullah

C. España-Bonet

Eleftherios Avramidis

Josef van Genabith

SLR

390

29 Oct 2025

A Survey on Unlearning in Large Language Models

789

29 Oct 2025

Text Simplification with Sentence Embeddings

Matthew Shardlow

112

28 Oct 2025

MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task

151

28 Oct 2025

Wisdom and Delusion of LLM Ensembles for Code Generation and Repair

Fernando Vallecillos Ruiz

Max Hort

Leon Moonen

211

24 Oct 2025

Structure-Conditional Minimum Bayes Risk Decoding

Bryan Eikema

Anna Rutkiewicz

Mario Giulianelli

201

23 Oct 2025

Spatio-temporal Sign Language Representation and TranslationConference on Machine Translation (WMT), 2025

329

22 Oct 2025

Sign Language Translation with Sentence Embedding SupervisionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

390

22 Oct 2025

Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

Yuu Jinnai

185

22 Oct 2025

SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision

460

22 Oct 2025

Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications

...

266

20 Oct 2025

Bolster Hallucination Detection via Prompt-Guided Data Augmentation

226

13 Oct 2025

Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

Christopher Chiu

Silviu Pitis

Mihaela van der Schaar

LM&MA ELM LRM

221

11 Oct 2025

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

182

10 Oct 2025

Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages

146

08 Oct 2025

LASER: An LLM-based ASR Scoring and Evaluation Rubric

Amruta Parulekar

Preethi Jyothi

141

08 Oct 2025

Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"

148

06 Oct 2025

Reward Models are Metrics in a Trench Coat

Sebastian Gehrmann

189

03 Oct 2025

Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation

266

02 Oct 2025

Automatic Fact-checking in English and Telugu

501

30 Sep 2025

Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents

130

29 Sep 2025

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

211

29 Sep 2025

LLM Hallucination Detection: HSAD

Jinxin Li

Gang Tu

Junjie Hu

253

28 Sep 2025

Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks

165

27 Sep 2025

Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation

Sherrie Shen

Weixuan Wang

Alexandra Birch

153

27 Sep 2025

Culture In a Frame: C

^3

B as a Comic-Based Benchmark for Multimodal Culturally Awareness

211

27 Sep 2025

Temporal Generalization: A Reality Check

169

27 Sep 2025

MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems

250

26 Sep 2025

Semantic Agreement Enables Efficient Open-Ended LLM Cascades

Duncan Soiffer

Steven Kolawole

Virginia Smith

315

26 Sep 2025

EnAnchored-X2X: English-Anchored Optimization for Many-to-Many Translation

184

24 Sep 2025

Evaluating Language Translation Models by Playing Telephone

Syeda Jannatus Saba

Steven Skiena

155

23 Sep 2025

Specification-Aware Machine Translation and Evaluation for Purpose Alignment

Yoko Kayano

Saku Sugawara

159

22 Sep 2025

Extending Automatic Machine Translation Evaluation to Book-Length Documents

196

21 Sep 2025

Deep learning and abstractive summarisation for radiological reports: an empirical study for adapting the PEGASUS models' family with scarce data

149

18 Sep 2025